Stargazer custom confidence intervals with multiple models - r

Stargazer is exponentiating 'wrong' confidence intervals because it is using normal distribution instead of t-distribution. So one has to use custom confidence intervals (Stargazer Confidence Interval Incorrect?).
But how does one it with multiple models?
model1 <- glm(vs ~ mpg + hp, data = mtcars, family = 'binomial')
model2 <- glm(vs ~ mpg + disp, data = mtcars, family = 'binomial')
library(stargazer)
stargazer(model1,
apply.coef = exp,
digits = 3,
ci = T,
t.auto = F,
type = "text",
ci.custom = list(exp(confint(model1))))
This works as intended. But when I am adding
ci.custom = list(exp(confint(model1, model2))))
then I'll get
Error in Pnames[which] : invalid subscript type 'list'
I tried with c() but to no avail.

The documentation says
a list of two-column numeric matrices ...
so
cc <- lapply(list(model1, model2), function(x) exp(confint(x)))
stargazer(model1, model2,
...,
ci.custom = cc)
should work. (cc <- list(exp(confint(model1)), exp(confint(model2))) also works, and is a little more explicit, but won't generalize as well ...)
For what it's worth, the difference for generalized linear models between the default CIs and those provided by confint() is not a Normal-vs-Student-t distinction (this is different from the case in the linked answer about linear models) — it's the difference between Wald and profile likelihood confidence intervals. (There is some theory for finite-size corrections in GLMs, called Bartlett corrections, but they're not easy to compute/widely available.)

Related

r gamlss: prediction of z-scores using regression with multiple explanatory variables

The function centiles.pred is a great option to extract z-scores based on a gamlss model like in the following code:
library(gamlss)
FIT = gamlss(mpg ~ disp, data = mtcars, family = BCPE)
NEWDATA = data.frame(disp = 300, mpg = 17)
centiles.pred(FIT, xvalues = NEWDATA$disp, xname = "disp", yval = NEWDATA$mpg, type = "z-scores")
However, the help-page of centiles.pred says "A restriction of the function is that it applies to models with only one explanatory variable". In many cases, however, you have more than one explanatory variable as in the following example:
FIT = gamlss(mpg ~ disp + qsec, data = mtcars, family = BCPE)
My question is:
Is there a workable way to calculate z-scores and centiles (also according to the arguments family = "standard-centiles", and family = "centiles" in the function centiled.pred) from a gamlss model with more than one explanatory variable?
The function predictAll(FIT,newdata= )
gives the fitted parameters (mu, sigma, nu, tau)
for new values of display and qsec.
[See Stasinopoulos et al. (2017) page 143.]
Then with the fitted (mu, sigma, nu, tau)
use the cdf of BCPE (i.e. pBCPE) to find the probability (say p) of being below your corresponding new value of mpg.
The z-score is then given by
qNO(p)
NOTE: example code
NEWDATA = data.frame(disp = 300, qsec = 50)
params <- predictAll(FIT,newdata=NEWDATA)
For mpg = 17, then
p <- pBCPE(17, params$mu, params$sigma, params$nu, params$tau)
The z-score is then given by
z <- qNO(p)
qNo is the inverse cdf of the standard normal distribution.

make R report adjusted R squared and F-test in output with robust standard errors

I have estimated a linear regression model using lm(x~y1 + y1 + ... + yn) and to counter the present heteroscedasticity I had R estimate the robust standard errors with
coeftest(model, vcov = vcovHC(model, type = "HC0"))
I know that (robust) R squared and F statistic from the "normal" model are still valid, but how do I get R to report them in the output? I want to fuse several regression output from different specifications together with stargazer and it would become very chaotic if I had to enter the non-robust model along just to get these statistics. Ideally I want to enter a regression output into stargazer that contains these statistics, thus importing it to their framework.
Thanks in advance for all answers
I don't have a solution with stargarzer, but I do have a couple of viable alternatives for regression tables with robust standard errors:
Option 1
Use the modelsummary package to make your tables.
it has a statistic_override argument which allows you to supply a function that calculates a robust variance covariance matrix (e.g., sandwich::vcovHC.
library(modelsummary)
library(sandwich)
mod1 <- lm(drat ~ mpg, mtcars)
mod2 <- lm(drat ~ mpg + vs, mtcars)
mod3 <- lm(drat ~ mpg + vs + hp, mtcars)
models <- list(mod1, mod2, mod3)
modelsummary(models, statistic_override = vcovHC)
Note 1: The screenshot above is from an HTML table, but the modelsummary package can also save Word, LaTeX or markdown tables.
Note 2: I am the author of this package, so please treat this as a potentially biased view.
Option 2
Use the estimatr::lm_robust function, which automatically includes robust standard errors. I believe that estimatr is supported by stargazer, but I know that it is supported by modelsummary.
library(estimatr)
mod1 <- lm_robust(drat ~ mpg, mtcars)
mod2 <- lm_robust(drat ~ mpg + vs, mtcars)
mod3 <- lm_robust(drat ~ mpg + vs + hp, mtcars)
models <- list(mod1, mod2, mod3)
modelsummary(models)
This is how to go about it. You need to use model object that is supported by stargazer as a template and then you can provide a list with standard errors to be used:
library(dplyr)
library(lmtest)
library(stargazer)
# Basic Model ---------------------------------------------------------------------------------
model1 <- lm(hp ~ factor(gear) + qsec + cyl + factor(am), data = mtcars)
summary(model1)
# Robust standard Errors ----------------------------------------------------------------------
model_robust <- coeftest(model1, vcov = vcovHC(model1, type = "HC0"))
# Get robust standard Errors (sqrt of diagonal element of variance-covariance matrix)
se = vcovHC(model1, type = "HC0") %>% diag() %>% sqrt()
stargazer(model1, model1,
se = list(NULL, se), type = 'text')
Using this approach you can use stargazer even for model objects that are not supported. You only need coefficients, standard errors and p-values as vectors. Then you can 'mechanically insert' even unsupported models.
One last Note. You are correct that once heteroskedasticity is present, Rsquared can still be used. However, overall F-test as well as t-tests are NOT valid anymore.

Calculating piecewise quantile linear regression with segmented package R

I am looking for a way to obtain the piecewise quantile linear regression with R. I have been able to compute the Quantile regression with the package quantreg. However, I don't want just 1 unique slope but want to check for breakpoints in my dataset. I have seen that the segmented package can do so. While it works good if the fit is carried out with lm or glm (as shown below in an example), it doesn't manage to work for quantile.
On the segmented package info I have read that there is a segmented.default which can be used for specific regression models, such as Quantiles. However, when I apply it for my quantile outcome it gives me the following errors:
Error in diag(vv) : invalid 'nrow' value (too large or NA)
In addition: Warning message:
cannot compute the covariance matrix
If instead of using K=2 I use for example psi I get other type of errors:
Error in rq.fit.br(x, y, tau = tau, ...) : Singular design matrix
I have created an example with the mtcars data so you can see the errors that I get.
library(quantreg)
library(segmented)
data(mtcars)
out.rq <- rq(mpg ~ wt, data= mtcars)
out.lm <- lm(mpg ~ wt, data= mtcars)
# Plotting the results
plot(mpg ~ wt, data = mtcars, pch = 1, main = "mpg ~ wt")
abline(out.lm, col = "red", lty = 2)
abline(out.rq, col = "blue", lty = 2)
legend("topright", legend = c("linear", "quantile"), col = c("red", "blue"), lty = 2)
#Generating segmented LM
o <- segmented(out.lm, seg.Z= ~wt, npsi=2, control=seg.control(display=FALSE))
plot(o, lwd=2, col=2:6, main="Segmented regression", res=FALSE) #lwd: line width #col: from 2 to 6 #RES: show datapoints
#Generating segmented Quantile
#using K=2
o.quantile <- segmented.default(out.rq, seg.Z= ~wt, control=seg.control(display=FALSE, K=2))
# using psi
o.quantile <- segmented.default(out.rq, seg.Z= ~wt, psi=list(wt=c(2,4)), control=seg.control(display=FALSE))
I came across this post after a long time because I have the same issue. Just in case others might be stuck with the problem in the future, I wanted to point out what the problem is.
I examined "segmented.default". There is a line in the source code as follows:
Cov <- try(vcov(objF), silent = TRUE)
vcov is used to calculate the covariance matrix but does not work for quantile regression object objF. To get the covariance matrix for quantile regression, you need:
summary(objF,se="boot",cov=TRUE)$cov
Here, I used bootstrap method to compute the covariance matrix by selecting se="boot" but you should choose the appropriate method for you. Check ?summary.rq then "se" section for different methods.
Additionally, you need to assign the row/column names as follows:
dimnames(Cov)[[1]] <- dimnames(Cov)[[2]] <- unlist(attributes(objF$coef))
After modifying the function, it worked for me.
Maybe the other answer isn't particularly clean, as you need to modify a package function.
Additionally, maybe boot isn't such a good idea for SEs, according to this answer.
To get it working a bit easier, add a function to your workspace:
vcov.rq <- function(object, ...) {
result = summary(object, se = "nid", covariance = TRUE)$cov
rownames(result) = colnames(result) = names(coef(object))
return(result)
}
Caveats from the Cross-Validated link apply.

GLMMadaptive for semi-continuous data

I am dealing with a very hard-to-work data set: fish larval density. It is a semicontinuous data, with 90% of zeros and a right-skewed distribution, with few very huge values. I would like, for example, to make some predictions about enviromental features and and larval density. I am trying to use a two part model (GLMMadaptive for semicontinuous data), family = hurdle.lognormal().
But the command summary does not work with models fitted with mixed_model(), family = hurdle.lognormal(). So, I don't know how to get standard errors, p-values and confidence intervals for my predictors.
Another question is related to Goodness of Fit for the residuals. How can I look for it?
Also, I tried to fit a null model, without fixed effects, looking for model significance, but I couldn't fix it, because it gives me the following message:
Error in .subset2(x, i, exact = exact) : subscript out of bounds
Nullmodel <- mixed_model(fixed = Dprochilodus ~ 1, random = ~ 1|periodo, data = OeL_final, family = hurdle.lognormal(), max_coef_value = 30)
mymodel <- mixed_model(fixed = Dprochilodus ~ ponto+Dif_his.y+temp, random = ~ 1 | periodo, data = OeL_final, family = hurdle.lognormal(), n_phis = 1, zi_fixed = ~ ponto, max_coef_value = 30)
The results of my model are:
Call: mixed_model(fixed = logDprochilodus ~ ponto + Dif_his.y + temp,
random = ~1 | periodo, data = OeL_final, family = hurdle.lognormal(),
zi_fixed = ~ponto, n_phis = 1, max_coef_value = 30)
Model: family: hurdle log-normal link: identity
Random effects covariance matrix:
StdDev (Intercept) 0.05366623
Fixed effects: (Intercept) pontoIR pontoITA pontoJEQ pontoTB Dif_his.y temp
3.781147e-01 -1.161167e-09 3.660306e-01 -1.273341e+00 -5.834588e-01 1.374241e+00 -4.010771e-02
Zero-part coefficients: (Intercept) pontoIR pontoITA pontoJEQ pontoTB
1.4522523 21.3761790 3.3013379 1.1504374 0.2031707
Residual std. dev.:
1.240212
log-Lik: -216.3266
Have some one worked with this kind of model?? I really appreciate any help!
The summary() method should work with family = hurdle.lognormal(). For example, you can call summary() in the example posted here.
To check the goodness-of-fit you could use the simulated scale residuals provided from the DHARMa package; for an example check here.
If you are working in Rstudio console you may need to print(summary())

Stepwise Regression with ROC

I am learning data science with R on DataCamp. In one exercise, I have to build a stepwise regression model. Even though I create the stepwise model successfully, roc() function doesn't accept the response and it gives an error like: "'response' has more than two levels. Consider setting 'levels' explicitly or using 'multiclass.roc' instead"
I want to learn how to handle this problem so I wrote my code below.
# Specify a null model with no predictors
null_model <- glm(donated ~ 1, data = donors, family = "binomial")
# Specify the full model using all of the potential predictors
full_model <- glm(donated ~ ., data = donors, family = "binomial")
# Use a forward stepwise algorithm to build a parsimonious model
step_model <- step(null_model, scope = list(lower = null_model, upper = full_model), direction = "forward")
# Estimate the stepwise donation probability
step_prob <- predict(step_model, type = "response")
# Plot the ROC of the stepwise model
library(pROC)
ROC <- roc( step_prob, donors$donated)
plot(ROC, col = "red")
auc(ROC)
I changed the order of roc function's argument and the error was solved.
library(pROC)
ROC <- roc( donors$donated, step_prob)
plot(ROC, col = "red")
auc(ROC)

Resources