R: Robust SE's and model diagnostics in stargazer table - r

I try to put some 2SLS regression outputs generated via ivreg() from the AER package into a Latex document using the stargazer package. I have a couple of problems however that I can't seem to solve myself.
I can't figure out on how to insert model diagnostics as provided by the summary of ivreg(). Namely weak instruments tests, Wu-Hausmann and Sargan Test. I would like to have them with the statistics usually reported underneath the table like number of observations, R-squared, and Resid. SE. The stargazer function doesn't seem to have an argument where you can provide a list with additional diagnostics. I didn't put this into my example because I honestly have no clue where to begin.
I want to exchange the normal standard errors with robust standard errors and the only way to do this that i found is producing objects with robust standard errors and adding them in the stargazer() function with se=list(). I put this into the minimum working example below. Is there maybe a more elegant way to code this or maybe re-estimate the model and save it with robust standard errors?
library(AER)
library(stargazer)
y <- rnorm(100, 5, 10)
x <- rnorm(100, 3, 15)
z <- rnorm(100, 3, 7)
a <- rnorm(100, 1, 7)
b <- rnorm(100, 3, 5)
# Fitting IV models
fit1 <- ivreg(y ~ x + a |
a + z,
model = TRUE)
fit2 <- ivreg(y ~ x + a |
a + b + z,
model = TRUE)
# Here are the se's and the diagnostics i want
summary(fit1, vcov = sandwich, diagnostics=T)
summary(fit2, vcov = sandwich, diagnostics=T)
# Getting robust se's, i think HC0 is the standard
# used with "vcov=sandwich" from the above summary
cov1 <- vcovHC(fit1, type = "HC0")
robust1 <- sqrt(diag(cov1))
cov2 <- vcovHC(fit2, type = "HC0")
robust2 <- sqrt(diag(cov1))
# Create latex table
stargazer(fit1, fit2, type = "latex", se=list(robust1, robust2))

Here's one way to do what you want:
require(lmtest)
rob.fit1 <- coeftest(fit1, function(x) vcovHC(x, type="HC0"))
rob.fit2 <- coeftest(fit2, function(x) vcovHC(x, type="HC0"))
summ.fit1 <- summary(fit1, vcov. = function(x) vcovHC(x, type="HC0"), diagnostics=T)
summ.fit2 <- summary(fit2, vcov. = function(x) vcovHC(x, type="HC0"), diagnostics=T)
stargazer(fit1, fit2, type = "text",
se = list(rob.fit1[,"Std. Error"], rob.fit2[,"Std. Error"]),
add.lines = list(c(rownames(summ.fit1$diagnostics)[1],
round(summ.fit1$diagnostics[1, "p-value"], 2),
round(summ.fit2$diagnostics[1, "p-value"], 2)),
c(rownames(summ.fit1$diagnostics)[2],
round(summ.fit1$diagnostics[2, "p-value"], 2),
round(summ.fit2$diagnostics[2, "p-value"], 2)) ))
Which will yield:
==========================================================
Dependent variable:
----------------------------
y
(1) (2)
----------------------------------------------------------
x -1.222 -0.912
(1.672) (1.002)
a -0.240 -0.208
(0.301) (0.243)
Constant 9.662 8.450**
(6.912) (4.222)
----------------------------------------------------------
Weak instruments 0.45 0.56
Wu-Hausman 0.11 0.18
Observations 100 100
R2 -4.414 -2.458
Adjusted R2 -4.526 -2.529
Residual Std. Error (df = 97) 22.075 17.641
==========================================================
Note: *p<0.1; **p<0.05; ***p<0.01
As you can see, this allows manually including the diagnostics in the respective models.
You could automate this approach by creating a function that takes in a list of models (e.g. list(summ.fit1, summ.fit2)) and outputs the objects required by se or add.lines arguments.
gaze.coeft <- function(x, col="Std. Error"){
stopifnot(is.list(x))
out <- lapply(x, function(y){
y[ , col]
})
return(out)
}
gaze.coeft(list(rob.fit1, rob.fit2))
gaze.coeft(list(rob.fit1, rob.fit2), col=2)
Will both take in a list of coeftest objects, and yield the SEs vector as expected by se:
[[1]]
(Intercept) x a
6.9124587 1.6716076 0.3011226
[[2]]
(Intercept) x a
4.2221491 1.0016012 0.2434801
Same can be done for the diagnostics:
gaze.lines.ivreg.diagn <- function(x, col="p-value", row=1:3, digits=2){
stopifnot(is.list(x))
out <- lapply(x, function(y){
stopifnot(class(y)=="summary.ivreg")
y$diagnostics[row, col, drop=FALSE]
})
out <- as.list(data.frame(t(as.data.frame(out)), check.names = FALSE))
for(i in 1:length(out)){
out[[i]] <- c(names(out)[i], round(out[[i]], digits=digits))
}
return(out)
}
gaze.lines.ivreg.diagn(list(summ.fit1, summ.fit2), row=1:2)
gaze.lines.ivreg.diagn(list(summ.fit1, summ.fit2), col=4, row=1:2, digits=2)
Both calls will yield:
$`Weak instruments`
[1] "Weak instruments" "0.45" "0.56"
$`Wu-Hausman`
[1] "Wu-Hausman" "0.11" "0.18"
Now the stargazer() call becomes as simple as this, yielding identical output as above:
stargazer(fit1, fit2, type = "text",
se = gaze.coeft(list(rob.fit1, rob.fit2)),
add.lines = gaze.lines.ivreg.diagn(list(summ.fit1, summ.fit2), row=1:2))

Related

Get standard errors from the output of the "ols" function

How do you get standard errors of the coefficients from the output of the "ols" function (package "rms") in R? I know that "coef" gets the coefficients of the ols object but did not find a way to get the standard errors of those coefficients.
You should use summary
# Example Data
Fact1 <- runif(200)
Fact2 <- sample(0:3, 200, TRUE)
distance <- (Fact1 + Fact2/3 + rnorm(200))^2
d <- rms::datadist(Fact1, Fact2)
# Model
ols_model <- rms::ols(sqrt(distance) ~ rms::rcs(Fact1,4) + rms::scored(Fact2), x = TRUE)
#Summary
model_summary <- summary(ols_model)
# Isolate SEs
model_se <- model_summary[,5]

Does caret::train() in r have a standardized output across different fit methods/models?

I'm working with the train() function from the caret package to fit multiple regression and ML models to test their fit. I'd like to write a function that iterates through all model types and enters the best fit into a dataframe. Biggest issue is that caret doesn't provide all the model fit statistics that I'd like so they need to be derived from the raw output. Based on my exploration there doesn't seem to be a standardized way caret outputs each models fit.
Another post (sorry don't have a link) created this function which pulls from fit$results and fit$bestTune to get pre calculated RMSE, R^2, etc.
get_best_result <- function(caret_fit) {
best = which(rownames(caret_fit$results) == rownames(caret_fit$bestTune))
best_result = caret_fit$results[best, ]
rownames(best_result) = NULL
best_result
}
One example of another fit statistic I need to calculate using raw output is BIC. The two functions below do that. The residuals (y_actual - y_predicted) are needed along with the number of x variables (k) and the number of rows used in the prediction (n). k and n must be derived from the output not the original dataset due to the models dropping x variables (feature selection) or rows (omitting NAs) based on its algorithm.
calculate_MSE <- function(residuals){
# residuals can be replaced with y_actual-y_predicted
mse <- mean(residuals^2)
return(mse)
}
calculate_BIC <- function(n, mse, k){
BIC <- n*log(mse)+k*log(n)
return(BIC)
}
The real question is is there a standardized output of caret::train() for x variables or either y_actual, y_predicted, or residuals?
I tried fit$finalModel$model and other methods but to no avail.
Here is a reproducible example along with the function I'm using. Please consider the functions above a part of this reproducible example.
library(rlist)
library(data.table)
# data
df <- data.frame(y1 = rnorm(50, 0, 1),
y2 = rnorm(50, .25, 1.5),
x1 = rnorm(50, .4, .9),
x2 = rnorm(50, 0, 1.1),
x3 = rnorm(50, 1, .75))
missing_index <- sample(1:50, 7, replace = F)
df[missing_index,] <- NA
# function to fit models and pull results
fitModels <- function(df, Ys, Xs, models){
# empty list
results <- list()
# number of for loops
loops_counter <- 0
# for every y
for(y in 1:length(Ys)){
# for every model
for(m in 1:length(models)){
# track loops
loops_counter <- loops_counter + 1
# fit the model
set.seed(1) # seed for reproducability
fit <- tryCatch(train(as.formula(paste(Ys[y], paste(Xs, collapse = ' + '),
sep = ' ~ ')),
data = df,
method = models[m],
na.action = na.omit,
tuneLength = 10),
error = function(e) {return(NA)})
# pull results
results[[loops_counter]] <- c(Y = Ys[y],
model = models[m],
sample_size = nrow(fit$finalModel$model),
RMSE = get_best_result(fit)[[2]],
R2 = get_best_result(fit)[[3]],
MAE = get_best_result(fit)[[4]],
BIC = calculate_BIC(n = length(fit$finalModel),
mse = calculate_MSE(fit$finalModel$residuals),
k = length(fit$finalModel$xNames)))
}
}
# list bind
results_df <- list.rbind(results)
return(results_df)
}
linear_models <- c('lm', 'glmnet', 'ridge', 'lars', 'enet')
fits <- fitModels(df, c(y1, y2), c(x1,x2,x3), linear_models)

How to use the replicate function in R to repeat the function

I have a problem when using replicate to repeat the function.
I tried to use the bootstrap to fit
a quadratic model using concentration as the predictor and Total_lignin as the response and going to report an estimate of the maximum with a corresponding standard error.
My idea is to create a function called bootFun that essentially did everything within one iteration of a for loop. bootFun took in only the data set the predictor, and the response to use (both variable names in quotes).
However, the SD is 0, not correct. I do not know where is the wrong place. Could you please help me with it?
# Load the libraries
library(dplyr)
library(tidyverse)
# Read the .csv and only use M.giganteus and S.ravennae.
dat <- read_csv('concentration.csv') %>%
filter(variety == 'M.giganteus' | variety == 'S.ravennae') %>%
arrange(variety)
# Check the data
head(dat)
# sample size
n <- nrow(dat)
# A function to do one iteration
bootFun <- function(dat, pred, resp){
# Draw the sample size from the dataset
sample <- sample_n(dat, n, replace = TRUE)
# A quadratic model fit
formula <- paste0('resp', '~', 'pred', '+', 'I(pred^2)')
fit <- lm(formula, data = sample)
# Derive the max of the value of concentration
max <- -fit$coefficients[2]/(2*fit$coefficients[3])
return(max)
}
max <- bootFun(dat = dat, pred = 'concentration', resp = 'Total_lignin' )
# Iterated times
N <- 5000
# Use 'replicate' function to do a loop
maxs <- replicate(N, max)
# An estimate of the max of predictor and corresponding SE
mean(maxs)
sd(maxs)
Base package boot, function boot, can ease the job of calling the bootstrap function repeatedly. The first argument must be the data set, the second argument is an indices argument, that the user does not set and other arguments can also be passed toit. In this case those other arguments are the predictor and the response names.
library(boot)
bootFun <- function(dat, indices, pred, resp){
# Draw the sample size from the dataset
dat.sample <- dat[indices, ]
# A quadratic model fit
formula <- paste0(resp, '~', pred, '+', 'I(', pred, '^2)')
formula <- as.formula(formula)
fit <- lm(formula, data = dat.sample)
# Derive the max of the value of concentration
max <- -fit$coefficients[2]/(2*fit$coefficients[3])
return(max)
}
N <- 5000
set.seed(1234) # Make the bootstrap results reproducible
results <- boot(dat, bootFun, R = N, pred = 'concentration', resp = 'Total_lignin')
results
#
#ORDINARY NONPARAMETRIC BOOTSTRAP
#
#
#Call:
#boot(data = dat, statistic = bootFun, R = N, pred = "concentration",
# resp = "Total_lignin")
#
#
#Bootstrap Statistics :
# original bias std. error
#t1* -0.4629808 -0.0004433889 0.03014259
#
results$t0 # this is the statistic, not bootstrapped
#concentration
# -0.4629808
mean(results$t) # bootstrap value
#[1] -0.4633233
Note that to fit a polynomial, function poly is much simpler than to explicitly write down the polynomial terms one by one.
formula <- paste0(resp, '~ poly(', pred, ',2, raw = TRUE)')
Check the distribution of the bootstrapped statistic.
op <- par(mfrow = c(1, 2))
hist(results$t)
qqnorm(results$t)
qqline(results$t)
par(op)
Test data
set.seed(2020) # Make the results reproducible
x <- cumsum(rnorm(100))
y <- x + x^2 + rnorm(100)
dat <- data.frame(concentration = x, Total_lignin = y)

Multiply coefficients with standard deviation

In R, the stargazer package offers the possibility to apply functions to the coefficients, standard errors, etc:
dat <- read.dta("http://www.ats.ucla.edu/stat/stata/dae/nb_data.dta")
dat <- within(dat, {
prog <- factor(prog, levels = 1:3, labels = c("General", "Academic", "Vocational"))
id <- factor(id)
})
m1 <- glm.nb(daysabs ~ math + prog, data = dat)
transform_coef <- function(x) (exp(x) - 1)
stargazer(m1, apply.coef=transform_coef)
How can I apply a function where the factor with which I multiply depends on the variable, like the standard deviation of that variable?
This may not be exactly what you hoped for, but you can transform the coefficients, and give stargazer a custom list of coefficients. For example, if you would like to report the coefficient times the standard deviation of each variable, the following extension of your example could work:
library(foreign)
library(stargazer)
library(MASS)
dat <- read.dta("http://www.ats.ucla.edu/stat/stata/dae/nb_data.dta")
dat <- within(dat, {
prog <- factor(prog, levels = 1:3, labels = c("General", "Academic", "Vocational"))
id <- factor(id)
})
m1 <- glm.nb(daysabs ~ math + prog, data = dat)
# Store coefficients (and other coefficient stats)
s1 <- summary(m1)$coefficients
# Calculate standard deviations (using zero for the constant)
math.sd <- sd(dat$math)
acad.sd <- sd(as.numeric(dat$prog == "Academic"))
voc.sd <- sd(as.numeric(dat$prog == "Vocational"))
int.sd <- 0
# Append standard deviations to stored coefficients
StdDev <- c(int.sd, math.sd, acad.sd, voc.sd)
s1 <- cbind(s1, StdDev)
# Store custom list
new.coef <- s1[ , "Estimate"] * s1[ , "StdDev"]
# Output
stargazer(m1, coef = list(new.coef))
You may want to consider a couple of issues outside your original question about outputting coefficients in stargazer. Should you report the intercept when multiplying times the standard deviation? Will your standard errors and inference be the same with this transformation?

Get coefficients estimated by maximum likelihood into a stargazer table

Stargazer produces very nice latex tables for lm (and other) objects. Suppose I've fit a model by maximum likelihood. I'd like stargazer to produce a lm-like table for my estimates. How can I do this?
Although it's a bit hacky, one way might be to create a "fake" lm object containing my estimates -- I think this would work as long as summary(my.fake.lm.object) works. Is that easily doable?
An example:
library(stargazer)
N <- 200
df <- data.frame(x=runif(N, 0, 50))
df$y <- 10 + 2 * df$x + 4 * rt(N, 4) # True params
plot(df$x, df$y)
model1 <- lm(y ~ x, data=df)
stargazer(model1, title="A Model") # I'd like to produce a similar table for the model below
ll <- function(params) {
## Log likelihood for y ~ x + student's t errors
params <- as.list(params)
return(sum(dt((df$y - params$const - params$beta*df$x) / params$scale, df=params$degrees.freedom, log=TRUE) -
log(params$scale)))
}
model2 <- optim(par=c(const=5, beta=1, scale=3, degrees.freedom=5), lower=c(-Inf, -Inf, 0.1, 0.1),
fn=ll, method="L-BFGS-B", control=list(fnscale=-1), hessian=TRUE)
model2.coefs <- data.frame(coefficient=names(model2$par), value=as.numeric(model2$par),
se=as.numeric(sqrt(diag(solve(-model2$hessian)))))
stargazer(model2.coefs, title="Another Model", summary=FALSE) # Works, but how can I mimic what stargazer does with lm objects?
To be more precise: with lm objects, stargazer nicely prints the dependent variable at the top of the table, includes SEs in parentheses below the corresponding estimates, and has the R^2 and number of observations at the bottom of the table. Is there a(n easy) way to obtain the same behavior with a "custom" model estimated by maximum likelihood, as above?
Here are my feeble attempts at dressing up my optim output as a lm object:
model2.lm <- list() # Mimic an lm object
class(model2.lm) <- c(class(model2.lm), "lm")
model2.lm$rank <- model1$rank # Problematic?
model2.lm$coefficients <- model2$par
names(model2.lm$coefficients)[1:2] <- names(model1$coefficients)
model2.lm$fitted.values <- model2$par["const"] + model2$par["beta"]*df$x
model2.lm$residuals <- df$y - model2.lm$fitted.values
model2.lm$model <- df
model2.lm$terms <- model1$terms # Problematic?
summary(model2.lm) # Not working
I was just having this problem and overcame this through the use of the coef se, and omit functions within stargazer... e.g.
stargazer(regressions, ...
coef = list(... list of coefs...),
se = list(... list of standard errors...),
omit = c(sequence),
covariate.labels = c("new names"),
dep.var.labels.include = FALSE,
notes.append=FALSE), file="")
You need to first instantiate a dummy lm object, then dress it up:
#...
model2.lm = lm(y ~ ., data.frame(y=runif(5), beta=runif(5), scale=runif(5), degrees.freedom=runif(5)))
model2.lm$coefficients <- model2$par
model2.lm$fitted.values <- model2$par["const"] + model2$par["beta"]*df$x
model2.lm$residuals <- df$y - model2.lm$fitted.values
stargazer(model2.lm, se = list(model2.coefs$se), summary=FALSE, type='text')
# ===============================================
# Dependent variable:
# ---------------------------
# y
# -----------------------------------------------
# const 10.127***
# (0.680)
#
# beta 1.995***
# (0.024)
#
# scale 3.836***
# (0.393)
#
# degrees.freedom 3.682***
# (1.187)
#
# -----------------------------------------------
# Observations 200
# R2 0.965
# Adjusted R2 0.858
# Residual Std. Error 75.581 (df = 1)
# F Statistic 9.076 (df = 3; 1)
# ===============================================
# Note: *p<0.1; **p<0.05; ***p<0.01
(and then of course make sure the remaining summary stats are correct)
I don't know how committed you are to using stargazer, but you can try using the broom and the xtable packages, the problem is that it won't give you the standard errors for the optim model
library(broom)
library(xtable)
xtable(tidy(model1))
xtable(tidy(model2))

Resources