Delta method and clustered standard errors - r

I have a question regarding how to apply the delta method when I have clustered standard errors. Consider the following dataset and (simple) regression ((Please note that this question is not necessarily about whether it makes sense to cluster around "us" or the correctness / usefulness of this regression).
#Use packages
library(multiwayvcov)
library(sandwich)
library(lmtest)
library(msm)
#load the data
data(mtcars)
# Run the regression
model1<-lm(mpg~cyl+gear+drat, data = mtcars)
#Calculate variance covariance matrix for clustered standard errors
vcov<-cluster.vcov(model1, mtcars$vs)
coeftest(model1, vcov)
# Apply delta method results in error
g<-model1$coefficients[2] / model1$coefficients[1]
deltamethod(g, mean, cov = vcov, ses=TRUE)
# Error I get is this one: "Error in deltamethod(g, mean = g, cov = vcov, ses = TRUE) :
# Covariances should be a 1 by 1 matrix"
Now I want to calculate the standard error for the coefficient (cyl) divided by (intercept) when using my matrix for clustered standard errors around "vs" (i.e. the vcov matrix). Does anyone know how to do this? I looked at this website, but for some reason I got an error when applying this (https://rdrr.io/rforge/msm/man/deltamethod.html). I appreciate any help.

Just editing the deltamethod call to output an answer - I don't know if this answer actually makes sense for what you want to do.
deltamethod(
g = formula('~x2/x1'),
mean = model1$coefficients,
cov = vcov,
ses = TRUE)

Related

Quasi-Poisson mixed-effect model on overdispersed count data from multiple imputed datasets in R

I'm dealing with problems of three parts that I can solve separately, but now I need to solve them together:
extremely skewed, over-dispersed dependent count variable (the number of incidents while doing something),
necessity to include random effects,
lots of missing values -> multiple imputation -> 10 imputed datasets.
To solve the first two parts, I chose a quasi-Poisson mixed-effect model. Since stats::glm isn't able to include random effects properly (or I haven't figured it out) and lme4::glmer doesn't support the quasi-families, I worked with glmer(family = "poisson") and then adjusted the std. errors, z statistics and p-values as recommended here and discussed here. So I basically turn Poisson mixed-effect regression into quasi-Poisson mixed-effect regression "by hand".
This is all good with one dataset. But I have 10 of them.
I roughly understand the procedure of analyzing multiple imputed datasets – 1. imputation, 2. model fitting, 3. pooling results (I'm using mice library). I can do these steps for a Poisson regression but not for a quasi-Poisson mixed-effect regression. Is it even possible to A) pool across models based on a quasi-distribution, B) get residuals from a pooled object (class "mipo")? I'm not sure. Also I'm not sure how to understand the pooled results for mixed models (I miss random effects in the pooled output; although I've found this page which I'm currently trying to go through).
Can I get some help, please? Any suggestions on how to complete the analysis (addressing all three issues above) would be highly appreciated.
Example of data is here (repre_d_v1 and repre_all_data are stored in there) and below is a crucial part of my code.
library(dplyr); library(tidyr); library(tidyverse); library(lme4); library(broom.mixed); library(mice)
# please download "qP_data.RData" from the last link above and load them
## ===========================================================================================
# quasi-Poisson mixed model from single data set (this is OK)
# first run Poisson regression on df "repre_d_v1", then turn it into quasi-Poisson
modelSingle = glmer(Y ~ Gender + Age + Xi + Age:Xi + (1|Country) + (1|Participant_ID),
family = "poisson",
data = repre_d_v1)
# I know there are some warnings but it's because I share only a modified subset of data with you (:
printCoefmat(coef(summary(modelSingle))) # unadjusted coefficient table
# define quasi-likelihood adjustment function
quasi_table = function(model, ctab = coef(summary(model))) {
phi = sum(residuals(model, type = "pearson")^2) / df.residual(model)
qctab = within(as.data.frame(ctab),
{`Std. Error` = `Std. Error`*sqrt(phi)
`z value` = Estimate/`Std. Error`
`Pr(>|z|)` = 2*pnorm(abs(`z value`), lower.tail = FALSE)
})
return(qctab)
}
printCoefmat(quasi_table(modelSingle)) # done, makes sense
## ===========================================================================================
# now let's work with more than one data set
# object "repre_all_data" of class "mids" contains 10 imputed data sets
# fit model using with() function, then pool()
modelMultiple = with(data = repre_all_data,
expr = glmer(Y ~ Gender + Age + Xi + Age:Xi + (1|Country) + (1|Participant_ID),
family = "poisson"))
summary(pool(modelMultiple)) # class "mipo" ("mipo.summary")
# this has quite similar structure as coef(summary(someGLM))
# but I don't see where are the random effects?
# and more importantly, I wanted a quasi-Poisson model, not just Poisson model...
# ...but here it is not possible to use quasi_table function (defined earlier)...
# ...and that's because I can't compute "phi"
This seems reasonable, with the caveat that I'm only thinking about the computation, not whether this makes statistical sense. What I'm doing here is computing the dispersion for each of the individual fits and then applying it to the summary table, using a variant of the machinery that you posted above.
## compute dispersion values
phivec <- vapply(modelMultiple$analyses,
function(model) sum(residuals(model, type = "pearson")^2) / df.residual(model),
FUN.VALUE = numeric(1))
phi_mean <- mean(phivec)
ss <- summary(pool(modelMultiple)) # class "mipo" ("mipo.summary")
## adjust
qctab <- within(as.data.frame(ss),
{ std.error <- std.error*sqrt(phi_mean)
statistic <- estimate/std.error
p.value <- 2*pnorm(abs(statistic), lower.tail = FALSE)
})
The results look weird (dispersion < 1, all model results identical), but I'm assuming that's because you gave us a weird subset as a reproducible example ...

Gelman-Rubin statistic for MCMCglmm model in R

I have a multivariate model with this (approximate) form:
library(MCMCglmm)
mod.1 <- MCMCglmm(
cbind(OFT1, MIS1, PC1, PC2) ~
trait-1 +
trait:sex +
trait:date,
random = ~us(trait):squirrel_id + us(trait):year,
rcov = ~us(trait):units,
family = c("gaussian", "gaussian", "gaussian", "gaussian"),
data= final_MCMC,
prior = prior.invgamma,
verbose = FALSE,
pr=TRUE, #this saves the BLUPs
nitt=103000, #number of iterations
thin=100, #interval at which the Markov chain is stored
burnin=3000)
For publication purposes, I've been asked to report the Gelman-Rubin statistic to indicate that the model has converged.
I have been trying to run:
gelman.diag(mod.1)
But, I get this error:
Error in mcmc.list(x) : Arguments must be mcmc objects
Any suggestions on the proper approach? I assume that the error means I can't pass my mod.1 output through gelman.diag(), but I am not sure what it is I am supposed to put there instead? My knowledge is quite limited here, so I'd appreciate any and all help!
Note that I haven't added the data here, but I suspect the answer is more code syntax and not data related.
The gelman.diag requires a mcmc.list. If we are running models with different set of parameters, extract the 'Sol' and place it in a list (Below, it is the same model)
library(MCMCglmm)
model1 <- MCMCglmm(PO~1, random=~FSfamily, data=PlodiaPO, verbose=FALSE,
nitt=1300, burnin=300, thin=1)
model2 <- MCMCglmm(PO~1, random=~FSfamily, data=PlodiaPO, verbose=FALSE,
nitt=1300, burnin=300, thin=1 )
mclist <- mcmc.list(model1$Sol, model2$Sol)
gelman.diag(mclist)
# gelman.diag(mclist)
#Potential scale reduction factors:
# Point est. Upper C.I.
#(Intercept) 1 1
According to the documentation, it seems to be applicable for more than one mcmc chain
Gelman and Rubin (1992) propose a general approach to monitoring convergence of MCMC output in which m > 1 parallel chains are run with starting values that are overdispersed relative to the posterior distribution.
The input x here is
x - An mcmc.list object with more than one chain, and with starting values that are overdispersed with respect to the posterior distribution.

Manually calculating robust standard errors for pglm using gradient and hessian matrix

I want to manually calculate robust standard error for a fixed effect poisson model produced using the pglm function that, unlike the plm function, does support sandwich error matrices. This make impossible to use the standard vcovHC() function to calculate standard errors, since it does know how to extract terms in a format it understands from the result of pglm (see Robust Standard Errors: Poisson Panel Regression (pglm, lmtest) and https://stats.stackexchange.com/questions/273152/vcovhc-heteroskedasticity-in-pooled-and-panel-probit).
Looking on the web, I found the following way to calculate cluster robust standard errors:
library(readstata13)
library(pglm)
library(lmtest)
library(MASS)
ships<-readstata13::read.dta13("http://www.stata-press.com/data/r13/ships.dta")
ships$lnservice=log(ships$service)
res1 <- pglm(accident ~ op_75_79+co_65_69+co_70_74+co_75_79+lnservice,family = poisson, data = ships, effect = "individual", model="within", index = "ship")
summary(res1)
standard_se<-ginv(-res1$hessian)
coeftest(res1,standard_se)
# Similar to e(sample) in STATA
esample<-as.numeric(rownames(as.matrix(res1$gradientObs)))
fc <- ships[esample,]$ship #isolates the groups used in estimation
# Calculates the new Meat portion of our covariance matrix
m <- length(unique(fc))
k <- 5
u <- res1$gradientObs
u.clust <- matrix(NA, nrow=m, ncol=k)
for(j in 1:k){
u.clust[,j] <- tapply(u[,j], fc, sum)
}
cl.vcov <-ginv(-res1$hessian)%*%( t(u.clust) %*% (u.clust))%*%ginv(-res1$hessian)
coeftest(res1,cl.vcov)
However, what I need are robust and not cluster robust standard errors. Could anyone shed some light on how to do that manually using gradient and hessian matrix?
Any help would be much appreciated. Thank you!

Standard Error of the Regression for NLS Model

I am currently working on a non-linear analysis of various datasets using nls model. On the other hand, I want to calculate the standard error of the regression of the nls model.
The formula of the standard error of regression:
n <- nrow(na.omit((data))
SE = (sqrt(sum(pv-av)^2)/(n-2))
where pv is the predicted value and av is the actual value.
I have a problem on calculating the standard error. Should I calculate the predicted value and actual value first? Are the values based on the dataset? Any help is highly appreciated. Thank You.
R provides this via sigma:
fm <- nls(demand ~ a + b * Time, BOD, start = list(a = 1, b = 1))
sigma(fm)
## [1] 3.085016
This would also work where deviance gives residual sum of squares.
sqrt(deviance(fm) / (nobs(fm) - length(coef(fm))))
## [1] 3.085016

Performing Anova on Bootstrapped Estimates from Quantile Regression

So I'm using the quantreg package in R to conduct quantile regression analyses to test how the effects of my predictors vary across the distribution of my outcome.
FML <- as.formula(outcome ~ VAR + c1 + c2 + c3)
quantiles <- c(0.25, 0.5, 0.75)
q.Result <- list()
for (i in quantiles){
i.no <- which(quantiles==i)
q.Result[[i.no]] <- rq(FML, tau=i, data, method="fn", na.action=na.omit)
}
Then i call anova.rq which runs a Wald test on all the models and outputs a pvalue for each covariate telling me whether the effects of each covariate vary significantly across the distribution of my outcome.
anova.Result <- anova(q.Result[[1]], q.Result[[2]], q.Result[[3]], joint=FALSE)
Thats works just fine. However, for my particular data (and in general?), bootstrapping my estimates and their error is preferable. Which i conduct with a slight modification of the code above.
q.Result <- rqs(FML, tau=quantiles, data, method="fn", na.action=na.omit)
q.Summary <- summary(Q.mod, se="boot", R=10000, bsmethod="mcmb",
covariance=TRUE)
Here's where i get stuck. The quantreg currently cannot peform the anova (Wald) test on boostrapped estimates. The information files on the quantreg packages specifically states that "extensions of the methods to be used in anova.rq should be made" regarding the boostrapping method.
Looking at the details of the anova.rq method. I can see that it requires 2 components not present in the quantile model when bootstrapping.
1) Hinv (Inverse Hessian Matrix). The package information files specifically states "note that for se = "boot" there is no way to split the estimated covariance matrix into its sandwich constituent parts."
2) J which, according to the information files, is "Unscaled Outer product of gradient matrix returned if cov=TRUE and se != "iid". The Huber sandwich is cov = tau (1-tau) Hinv %*% J %*% Hinv. as for the Hinv component, there is no J component when se == "boot". (Note that to make the Huber sandwich you need to add the tau (1-tau) mayonnaise yourself.)"
Can i calculate or estimate Hinv and J from the bootstrapped estimates? If not what is the best way to proceed?
Any help on this much appreciated. This my first timing posting a question here, though I've greatly benefited from the answers to other peoples questions in the past.
For question 2: You can use R = for resampling. For example:
anova(object, ..., test = "Wald", joint = TRUE, score =
"tau", se = "nid", R = 10000, trim = NULL)
Where R is the number of resampling replications for the anowar form of the test, used to estimate the reference distribution for the test statistic.
Just a heads up, you'll probably get a better response to your questions if you only include 1 question per post.
Consulted with a colleague, and he confirmed that it was unlikely that Hinv and J could be 'reverse' computed from bootstrapped estimates. However we resolved that estimates from different taus could be compared using Wald test as follows.
From object rqs produced by
q.Summary <- summary(Q.mod, se="boot", R=10000, bsmethod="mcmb", covariance=TRUE)
you extract the bootstrapped Beta values for variable of interest in this case VAR, the first covariate in FML for each tau
boot.Bs <- sapply(q.Summary, function (x) x[["B"]][,2])
B0 <- coef(summary(lm(FML, data)))[2,1] # Extract liner estimate data linear estimate
Then compute wald statistic and get pvalue with number of quantiles for degrees of freedom
Wald <- sum(apply(boot.Bs, 2, function (x) ((mean(x)-B0)^2)/var(x)))
Pvalue <- pchisq(Wald, ncol(boot.Bs), lower=FALSE)
You also want to verify that bootstrapped Betas are normally distributed, and if you're running many taus it can be cumbersome to check all those QQ plots so just sum them by row
qqnorm(apply(boot.Bs, 1, sum))
qqline(apply(boot.Bs, 1, sum), col = 2)
This seems to be working, and if anyone can think of anything wrong with my solution, please share

Resources