how are the Degrees of freedom calculated in plm:::vcovDC.plm? - r

I use a fixed effect model with time and group fixed effects. Further, I want to calculate robust clustered standard errors. Therefore, I use coeftest(model, vcov = vcovDC(model))
I do not understand how the degrees of freedom are calculated for the provided t-statistics. Does it use the same degrees of freedom like the in the provided plm-fixed-effect model or are they adjusted. Probably my question is rather; are the degrees of freedom adjusted when one uses clustered standard errors for a two-way fixed effect model, or do they remain the same?

plm calculates an ordinary variance–covariance matrix (VCOV). When you use summary on your plm object (what you probably mean by "provided plm-fixed-effect model"), actually the plm:::summary.plm method is applied, which uses ordinary standard errors (SE) without degrees-of-freedom correction, until you change the vcov= argument defaulting to NULL to another VCOV calculated differently, e.g. with vcovCL or vcovDC.
You can do lmtest::coeftest(fit, vcov.=...), or directly summary(fit, vcov=...), as I show you below in an example.
Example
library(plm)
data(Cigar)
fit <- plm(sales ~ price, data=Cigar, effect="twoways", model="within",
index=c("state", "year"))
summary(fit)$coe
# same:
summary(fit, vcov=NULL)$coe ## default, ordinary SE
# Estimate Std. Error t-value Pr(>|t|)
# price -1.084712 0.07554847 -14.35782 1.640552e-43
Now, to get robust standard errors (without adjustment for clustering), we may use vcovCL and consider the type= argument. In ?sandwich::vcovCL we may read:
HC0 applies no small sample bias adjustment. HC1 applies a degrees of
freedom-based correction, (n-1)/(n-k) where n is the number of
observations and k is the number of explanatory or predictor variables
in the model.
summary(fit, vcov=vcovHC)$coe
# same:
summary(fit, vcov=vcovHC(fit, type="HC0"))$coe ## robust SE
# Estimate Std. Error t-value Pr(>|t|)
# price -1.084712 0.2406786 -4.506889 7.168418e-06
summary(fit, vcov=vcovHC(fit, type="HC1"))$coe ## robust SE, df-corrected
# Estimate Std. Error t-value Pr(>|t|)
# price -1.084712 0.2407658 -4.505256 7.22292e-06
The same applies to vcovDC and its type= argument for robust standard errors, doubly adjusted for clustering on group and time:
summary(fit, vcov=vcovDC(fit))$coe
# same:
summary(fit, vcov=vcovDC(fit, type="HC0"))$coe ## double-cluster-robust SE
# Estimate Std. Error t-value Pr(>|t|)
# price -1.084712 0.2923507 -3.71031 0.0002157146
summary(fit, vcov=vcovDC(fit, type="HC1"))$coe ## double-cluster-robust SE, df-corrected
# Estimate Std. Error t-value Pr(>|t|)
# price -1.084712 0.2924567 -3.708966 0.0002168511

Related

How to find intercept coefficients after setting restrictions with linearHypothese

So I built my logistic regression model using glm(). When I display the summary I get this with values for each variable.
Coefficients:
Estimate Std. Error z value Pr(>|z|)
After I have set a restriction using linearHypotheses but I only get results for chisquared, residuals df, and df. Am I able to see what the coefficients are for the model with the restriction?
library(car)
library(tidyverse)
library(dplyr)
logitmodel <- glm(x~ratio+x+y, family=binomial(link="logit"))
nullhypothese <- "x=0"
restrictedmodel <- linearHypothesis(logitmodel, nullhypothesis)

Residual standard error in survey package

I am trying to calculate the residual standard error of a linear regression model using the survey package. I am working with a complex design, and the sampling weight of the complex design is given by "weight" in the code below.
fitM1 <- lm(med~x1+x2,data=pop_sample,weight=weight)
fitM2 <- svyglm(med~x1+x2,data=pop_sample,design=design)
First, if I call "summary(fitM1)", I get the following:
Call: lm(formula=med~x1+x2,data=pop_sample,weights=weight)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.001787 0.042194 0.042 0.966
x1 0.382709 0.061574 6.215 1.92e-09 ***
x2 0.958675 0.048483 19.773 < 2e-16 ***
Residual standard error: 9.231 on 272 degrees of freedom
Multiple R-squared: 0.8958, Adjusted R-squared: 0.8931
F-statistic: 334.1 on 7 and 272 DF, p-value: < 2.2e-16
Next, if I call "summary(fitM2)", I get the following:
summary(fitM2)
Call: svyglm(formula=med~x1+x2,data=pop_sample,design=design)
Survey design: svydesign(id=~id_cluster,strat=~id_stratum,weight=weight,data=pop_sample)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.001787 0.043388 0.041 0.967878
x1 0.382709 0.074755 5.120 0.000334 ***
x2 0.958675 0.041803 22.933 1.23e-10 ***
When using "lm", I can extract the residual standard error by calling:
fitMvariance <- summary(fitM1)$sigma^2
However, I can't find an analogous function for "svyglm" anywhere in the survey package. The point estimates are the same when comparing the two approaches, but the standard errors of the coefficients (and, presumably, the residual standard error of the model) are different.
Survey Analysis
use the library survey in the r to perform survey analysis, it offers a wide range of functions to calculate the statistics like Percentage, Lower CI, Upper CI, population and RSE.
RSE
we can use thesvyby function in the survey package to get all the statistics including the Root squared error
library("survey")
Survey design: svydesign(id=~id_cluster,strat=~id_stratum,weight=weight,data=pop_sample)
svyby(~med, ~x1+x2, design, svytotal, deff=TRUE, verbose=TRUE,vartype=c("se","cv","cvpct","var"))
The cvpct will give the root squared error
Refer for further information svyby
Because svyglm is built on glm not lm, the variance estimate is called $dispersion rather than $sigma
> data(api)
> dstrat<-svydesign(id = ~1, strata = ~stype, weights = ~pw, data = apistrat,
+ fpc = ~fpc)
> model<-svyglm(api00~ell+meals+mobility, design=dstrat)
> summary(model)$dispersion
variance SE
[1,] 5172 492.28
This is the estimate of $\sigma^2$, which is the population residual variance. In this example we actually have the whole population, so we can compare
> popmodel<-lm(api00~ell+meals+mobility, data=apipop)
> summary(popmodel)$sigma
[1] 70.58365
> sqrt(5172)
[1] 71.91662

How do I overcome this error in execution of lmer function: Error in if (REML) p else 0L : argument is not interpretable as logical?

I have checked out the defensive methods as was discussed in this post in order to prevent this error but it still doesn't go away.
model<-lmer(Proportion~Plot+Treatment+(1|Plot/Treatment),binomial,data=data)
Error in if (REML) p else 0L : argument is not interpretable as logical
tl;dr you should use glmer instead. Because you haven't named your arguments, R is interpreting them by position (order). lmer's third argument is REML, so R thinks you're specifying REML=binomial, which isn't a legitimate value. family is the third argument to glmer, so this would work (sort of: see below) if you used glmer, but it's usually safer to name the arguments explicitly if there's any possibility of getting confused.
A reproducible example would be nice, but:
model <- glmer(Proportion~Plot+Treatment+(1|Plot/Treatment),
family=binomial,data=data)
is a starting point. I foresee a few more problems though:
if your data are not Bernoulli (0/1) (which I'm guessing not since your response is called Proportion), then you need to include the total number sampled in each trial, e.g. by specifying a weights argument
you have Plot and Treatment as both fixed and as random-effect grouping variables in your model; that won't work. I see that Crawley really does suggest this in the R book (google books link).
Do not do it the way he suggests, it doesn't make any sense. Replicating:
library(RCurl)
url <- "https://raw.githubusercontent.com/jejoenje/Crawley/master/Data/insects.txt"
dd <- read.delim(text=getURL(url),header=TRUE)
## fix typo because I'm obsessive:
levels(dd$treatment) <- c("control","sprayed")
library(lme4)
model <- glmer(cbind(dead,alive)~block+treatment+(1|block/treatment),
data=dd,family=binomial)
If we look at the among-group standard deviation, we see that it's zero for both groups; it's exactly zero for block because block is already included in the fixed effects. It need not be for the treatment:block interaction (we have treatment, but not the interaction between block and treatment, in the fixed effects), but is because there's little among-treatment-within-block variation:
VarCorr(model)
## Groups Name Std.Dev.
## treatment:block (Intercept) 2.8736e-09
## block (Intercept) 0.0000e+00
Conceptually, it makes more sense to treat block as a random effect:
dd <- transform(dd,prop=dead/(alive+dead),ntot=alive+dead)
model1 <- glmer(prop~treatment+(1|block/treatment),
weights=ntot,
data=dd,family=binomial)
summary(model)
## ...
## Formula: prop ~ treatment + (1 | block/treatment)
## Random effects:
## Groups Name Variance Std.Dev.
## treatment:block (Intercept) 0.02421 0.1556
## block (Intercept) 0.18769 0.4332
## Number of obs: 48, groups: treatment:block, 12; block, 6
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.1640 0.2042 -5.701 1.19e-08 ***
## treatmentsprayed 3.2434 0.1528 21.230 < 2e-16 ***
Sometimes you might want to treat it as a fixed effect:
model2 <- update(model1,.~treatment+block+(1|block:treatment))
summary(model2)
## Random effects:
## Groups Name Variance Std.Dev.
## block:treatment (Intercept) 5.216e-18 2.284e-09
## Number of obs: 48, groups: block:treatment, 12
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.5076 0.0739 -6.868 6.50e-12 ***
## treatmentsprayed 3.2676 0.1182 27.642 < 2e-16 ***
Now the block-by-treatment interaction variance is effectively zero (because block soaks up more variability if treated as a fixed effect). However, the estimated effect of spraying is very nearly identical.
If you're worried about overdispersion you can add an individual-level random effect (or use MASS::glmmPQL; lme4 no longer fits quasi-likelihood models)
dd <- transform(dd,obs=factor(seq(1:nrow(dd))))
model3 <- update(model1,.~.+(1|obs))
## Random effects:
## Groups Name Variance Std.Dev.
## obs (Intercept) 4.647e-01 6.817e-01
## treatment:block (Intercept) 1.138e-09 3.373e-05
## block (Intercept) 1.813e-01 4.258e-01
## Number of obs: 48, groups: obs, 48; treatment:block, 12; block, 6
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.1807 0.2411 -4.897 9.74e-07 ***
## treatmentsprayed 3.3481 0.2457 13.626 < 2e-16 ***
The observation-level effect has effectively replaced the treatment-by-block interaction (which is now close to zero). Again, the estimated spraying effect has hardly changed (but its standard error is twice as large ...)

Using confint in R on dataset with NAs

For a null model glmer() I would like to calculate 95% CI of the intercept by using the confint() function in R on a dataset that contain NAs. Below is the model summary:
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: cbind(df$Valid.detections, df$Missed.detections) ~ 1 + (1 | SUR.ID) + (1 | Unit)
Data: df
Control: glmerControl(calc.derivs = F, optCtrl = list(maxfun = 20000))
AIC BIC logLik deviance df.resid
21286.9 21305.4 -10640.4 21280.9 3549
Scaled residuals:
Min 1Q Median 3Q Max
-0.40089 -0.39994 -0.00010 0.02841 0.56340
Random effects:
Groups Name Variance Std.Dev.
Unit (Intercept) 2.237e+01 4.729e+00
SUR.ID (Intercept) 1.883e-10 1.372e-05
Number of obs: 3552, groups: Unit, 3552; SUR.ID, 20
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.07468 0.08331 -24.9 <2e-16 ***
However, when I try to calculate 95% CIs for the intercept it returns an error:
Computing bootstrap confidence intervals ...
Error in if (const(t, min(1e-08, mean(t, na.rm = TRUE)/1e+06))) { :
missing value where TRUE/FALSE needed
In addition: Warning message:
In bootMer(object, bootFun, nsim = nsim, ...) :
some bootstrap runs failed (200/200)
Timing stopped at: 42.47 0.289 45.836
I googled the error and warning messages for a solution and found this thread http://r-sig-mixed-models.r-project.narkive.com/3vst8TmK/r-sig-me-confint-mermod-method-boot-throws-error, in which Ben Bolker suggested that one way to work around this issue is to remove the rows with NAs prior to using the confint() function. I tried this and no error / warnings are returned, but found that the calculated 95% CIs do not envelope the intercept estimate.
> c0
2.5 % 97.5 %
-3.129255 -2.931859
The calculated CIs do envelope the intercept estimate of the null model of which the NAs were excluded prior the use of the confint() function, however, I need the NAs in there, if possible. Any suggestions how this would be possible?
Thank you for any help.

Get variables from summary?

I want to grab the Standard Error column when I do summary on a linear regression model. The output is below:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -8.436954 0.616937 -13.676 < 2e-16 ***
x1 -0.138902 0.024247 -5.729 1.01e-08 ***
x2 0.005978 0.009142 0.654 0.51316 `
...
I just want the Std. Error column values stored into a vector. How would I go about doing so? I tried model$coefficients[,2] but that keeps giving me extra values. If anyone could help that would be great.
Say fit is the linear model, then summary(fit)$coefficients[,2] has the standard errors. Type ?summary.lm.
fit <- lm(y~x, myData)
summary(fit)$coefficients[,1] # the coefficients
summary(fit)$coefficients[,2] # the std. error in the coefficients
summary(fit)$coefficients[,3] # the t-values
summary(fit)$coefficients[,4] # the p-values

Resources