Get standard errors from the output of the "ols" function - r

How do you get standard errors of the coefficients from the output of the "ols" function (package "rms") in R? I know that "coef" gets the coefficients of the ols object but did not find a way to get the standard errors of those coefficients.

You should use summary
# Example Data
Fact1 <- runif(200)
Fact2 <- sample(0:3, 200, TRUE)
distance <- (Fact1 + Fact2/3 + rnorm(200))^2
d <- rms::datadist(Fact1, Fact2)
# Model
ols_model <- rms::ols(sqrt(distance) ~ rms::rcs(Fact1,4) + rms::scored(Fact2), x = TRUE)
#Summary
model_summary <- summary(ols_model)
# Isolate SEs
model_se <- model_summary[,5]

Related

Dummies not included in summary

I want to create a function which will perform panel regression with 3-level dummies included.
Let's consider within model with time effects :
library(plm)
fit_panel_lr <- function(y, x) {
x[, length(x) + 1] <- y
#adding dummies
mtx <- matrix(0, nrow = nrow(x), ncol = 3)
mtx[cbind(seq_len(nrow(mtx)), 1 + (as.integer(unlist(x[, 2])) - min(as.integer(unlist(x[, 2])))) %% 3)] <- 1
colnames(mtx) <- paste0("dummy_", 1:3)
#converting to pdataframe and adding dummy variables
x <- pdata.frame(x)
x <- cbind(x, mtx)
#performing panel regression
varnames <- names(x)[3:(length(x))]
varnames <- varnames[!(varnames == names(y))]
form <- paste0(varnames, collapse = "+")
x_copy <- data.frame(x)
form <- as.formula(paste0(names(y), "~", form,'-1'))
params <- list(
formula = form, data = x_copy, model = "within",
effect = "time"
)
pglm_env <- list2env(params, envir = new.env())
model_plm <- do.call("plm", params, envir = pglm_env)
model_plm
}
However, if I use data :
data("EmplUK", package="plm")
dep_var<-EmplUK['capital']
df1<-EmplUK[-6]
In output I will get :
> fit_panel_lr(dep_var, df1)
Model Formula: capital ~ sector + emp + wage + output + dummy_1 + dummy_2 +
dummy_3 - 1
<environment: 0x000001ff7d92a3c8>
Coefficients:
sector emp wage output
-0.055179 0.328922 0.102250 -0.002912
How come that in formula dummies are considered and in coefficients are not ? Is there any rational explanation or I did something wrong ?
One point why you do not see the dummies on the output is because they are linear dependent to the other data after the fixed-effect time transformation. They are dropped so what is estimable is estimated and output.
Find below some (not readily executable) code picking up your example from above:
dat <- cbind(EmplUK, mtx) # mtx being the dummy matrix constructed in your question's code for this data set
pdat <- pdata.frame(dat)
rhs <- paste(c("emp", "wage", "output", "dummy_1", "dummy_2", "dummy_3"), collapse = "+")
form <- paste("capital ~" , rhs)
form <- formula(form)
mod <- plm(form, data = pdat, model = "within", effect = "time")
detect.lindep(mod$model) # before FE time transformation (original data) -> nothing offending
detect.lindep(model.matrix(mod)) # after FE time transformation -> dummies are offending
The help page for detect.lindep (?detect.lindep is included in package plm) has some more nice examples on linear dependence before and after FE transformation.
A suggestion:
As for constructing dummy variables, I suggest to use R's factor with three levels and not have the dummy matrix constructed yourself. Using a factor is typically more convinient and less error prone. It is converted to the binary dummies (treatment style) by your typical estimation function using the model.frame/model.matrix framework.

How to use the replicate function in R to repeat the function

I have a problem when using replicate to repeat the function.
I tried to use the bootstrap to fit
a quadratic model using concentration as the predictor and Total_lignin as the response and going to report an estimate of the maximum with a corresponding standard error.
My idea is to create a function called bootFun that essentially did everything within one iteration of a for loop. bootFun took in only the data set the predictor, and the response to use (both variable names in quotes).
However, the SD is 0, not correct. I do not know where is the wrong place. Could you please help me with it?
# Load the libraries
library(dplyr)
library(tidyverse)
# Read the .csv and only use M.giganteus and S.ravennae.
dat <- read_csv('concentration.csv') %>%
filter(variety == 'M.giganteus' | variety == 'S.ravennae') %>%
arrange(variety)
# Check the data
head(dat)
# sample size
n <- nrow(dat)
# A function to do one iteration
bootFun <- function(dat, pred, resp){
# Draw the sample size from the dataset
sample <- sample_n(dat, n, replace = TRUE)
# A quadratic model fit
formula <- paste0('resp', '~', 'pred', '+', 'I(pred^2)')
fit <- lm(formula, data = sample)
# Derive the max of the value of concentration
max <- -fit$coefficients[2]/(2*fit$coefficients[3])
return(max)
}
max <- bootFun(dat = dat, pred = 'concentration', resp = 'Total_lignin' )
# Iterated times
N <- 5000
# Use 'replicate' function to do a loop
maxs <- replicate(N, max)
# An estimate of the max of predictor and corresponding SE
mean(maxs)
sd(maxs)
Base package boot, function boot, can ease the job of calling the bootstrap function repeatedly. The first argument must be the data set, the second argument is an indices argument, that the user does not set and other arguments can also be passed toit. In this case those other arguments are the predictor and the response names.
library(boot)
bootFun <- function(dat, indices, pred, resp){
# Draw the sample size from the dataset
dat.sample <- dat[indices, ]
# A quadratic model fit
formula <- paste0(resp, '~', pred, '+', 'I(', pred, '^2)')
formula <- as.formula(formula)
fit <- lm(formula, data = dat.sample)
# Derive the max of the value of concentration
max <- -fit$coefficients[2]/(2*fit$coefficients[3])
return(max)
}
N <- 5000
set.seed(1234) # Make the bootstrap results reproducible
results <- boot(dat, bootFun, R = N, pred = 'concentration', resp = 'Total_lignin')
results
#
#ORDINARY NONPARAMETRIC BOOTSTRAP
#
#
#Call:
#boot(data = dat, statistic = bootFun, R = N, pred = "concentration",
# resp = "Total_lignin")
#
#
#Bootstrap Statistics :
# original bias std. error
#t1* -0.4629808 -0.0004433889 0.03014259
#
results$t0 # this is the statistic, not bootstrapped
#concentration
# -0.4629808
mean(results$t) # bootstrap value
#[1] -0.4633233
Note that to fit a polynomial, function poly is much simpler than to explicitly write down the polynomial terms one by one.
formula <- paste0(resp, '~ poly(', pred, ',2, raw = TRUE)')
Check the distribution of the bootstrapped statistic.
op <- par(mfrow = c(1, 2))
hist(results$t)
qqnorm(results$t)
qqline(results$t)
par(op)
Test data
set.seed(2020) # Make the results reproducible
x <- cumsum(rnorm(100))
y <- x + x^2 + rnorm(100)
dat <- data.frame(concentration = x, Total_lignin = y)

Fast post hoc computation using R

I have a large dataset which I would like to perform post hoc computation:
dat = as.data.frame(matrix(runif(10000*300), ncol = 10000, nrow = 300))
dat$group = rep(letters[1:3], 100)
Here is my code:
start <- Sys.time()
vars <- names(dat)[-ncol(dat)]
aov.out <- lapply(vars, function(x) {
lm(substitute(i ~ group, list(i = as.name(x))), data = dat)})
TukeyHSD.out <- lapply(aov.out, function(x) TukeyHSD(aov(x)))
Sys.time() - start
Time difference of 4.033335 mins
It takes about 4 min, are there more efficient and elegant ways to perform post hoc using R?
Thanks a lot
Your example is too big. For illustration of the idea I use a small one.
set.seed(0)
dat = as.data.frame(matrix(runif(2*300), ncol = 2, nrow = 300))
dat$group = rep(letters[1:3], 100)
Why do you call aov on a fitted "lm" model? That basically refits the same model.
Have a read on Fitting a linear model with multiple LHS first. lm is the workhorse of aov, so you can pass a multiple LHS formula to aov. The model has class c("maov", "aov", "mlm", "lm").
response_names <- names(dat)[-ncol(dat)]
form <- as.formula(sprintf("cbind(%s) ~ group", toString(response_names)))
fit <- do.call("aov", list(formula = form, data = quote(dat)))
Now the issue is: there is no "maov" method for TuckyHSD. So we need a hacking.
TuckyHSD relies on the residuals of the fitted model. In c("aov", "lm") case the residuals is a vector, but in c("maov", "aov", "mlm", "lm") case it is a matrix. The following demonstrates the hacking.
aov_hack <- fit
aov_hack[c("coefficients", "fitted.values")] <- NULL ## don't need them
aov_hack[c("contrasts", "xlevels")] <- NULL ## don't need them either
attr(aov_hack$model, "terms") <- NULL ## don't need it
class(aov_hack) <- c("aov", "lm") ## drop "maov" and "mlm"
## the following elements are mandatory for `TukeyHSD`
## names(aov_hack)
#[1] "residuals" "effects" "rank" "assign" "qr"
#[6] "df.residual" "call" "terms" "model"
N <- length(response_names) ## number of response variables
result <- vector("list", N)
for (i in 1:N) {
## change response variable in the formula
aov_hack$call[[2]][[2]] <- as.name(response_names[i])
## change residuals
aov_hack$residuals <- fit$residuals[, i]
## change effects
aov_hack$effects <- fit$effects[, i]
## change "terms" object and attribute
old_tm <- terms(fit) ## old "terms" object in the model
old_tm[[2]] <- as.name(response_names[i]) ## change response name in terms
new_tm <- terms.formula(formula(old_tm)) ## new "terms" object
aov_hack$terms <- new_tm ## replace `aov_hack$terms`
## replace data in the model frame
aov_hack$model[1] <- data.frame(fit$model[[1]][, i])
names(aov_hack$model)[1] <- response_names[i]
## run `TukeyHSD` on `aov_hack`
result[[i]] <- TukeyHSD(aov_hack)
}
result[[1]] ## for example
# Tukey multiple comparisons of means
# 95% family-wise confidence level
#
#Fit: aov(formula = V1 ~ group, data = dat)
#
#$group
# diff lwr upr p adj
#b-a -0.012743870 -0.1043869 0.07889915 0.9425847
#c-a -0.022470482 -0.1141135 0.06917254 0.8322109
#c-b -0.009726611 -0.1013696 0.08191641 0.9661356
I have used a "for" loop. Replace it with a lapply if you want.

R: Robust SE's and model diagnostics in stargazer table

I try to put some 2SLS regression outputs generated via ivreg() from the AER package into a Latex document using the stargazer package. I have a couple of problems however that I can't seem to solve myself.
I can't figure out on how to insert model diagnostics as provided by the summary of ivreg(). Namely weak instruments tests, Wu-Hausmann and Sargan Test. I would like to have them with the statistics usually reported underneath the table like number of observations, R-squared, and Resid. SE. The stargazer function doesn't seem to have an argument where you can provide a list with additional diagnostics. I didn't put this into my example because I honestly have no clue where to begin.
I want to exchange the normal standard errors with robust standard errors and the only way to do this that i found is producing objects with robust standard errors and adding them in the stargazer() function with se=list(). I put this into the minimum working example below. Is there maybe a more elegant way to code this or maybe re-estimate the model and save it with robust standard errors?
library(AER)
library(stargazer)
y <- rnorm(100, 5, 10)
x <- rnorm(100, 3, 15)
z <- rnorm(100, 3, 7)
a <- rnorm(100, 1, 7)
b <- rnorm(100, 3, 5)
# Fitting IV models
fit1 <- ivreg(y ~ x + a |
a + z,
model = TRUE)
fit2 <- ivreg(y ~ x + a |
a + b + z,
model = TRUE)
# Here are the se's and the diagnostics i want
summary(fit1, vcov = sandwich, diagnostics=T)
summary(fit2, vcov = sandwich, diagnostics=T)
# Getting robust se's, i think HC0 is the standard
# used with "vcov=sandwich" from the above summary
cov1 <- vcovHC(fit1, type = "HC0")
robust1 <- sqrt(diag(cov1))
cov2 <- vcovHC(fit2, type = "HC0")
robust2 <- sqrt(diag(cov1))
# Create latex table
stargazer(fit1, fit2, type = "latex", se=list(robust1, robust2))
Here's one way to do what you want:
require(lmtest)
rob.fit1 <- coeftest(fit1, function(x) vcovHC(x, type="HC0"))
rob.fit2 <- coeftest(fit2, function(x) vcovHC(x, type="HC0"))
summ.fit1 <- summary(fit1, vcov. = function(x) vcovHC(x, type="HC0"), diagnostics=T)
summ.fit2 <- summary(fit2, vcov. = function(x) vcovHC(x, type="HC0"), diagnostics=T)
stargazer(fit1, fit2, type = "text",
se = list(rob.fit1[,"Std. Error"], rob.fit2[,"Std. Error"]),
add.lines = list(c(rownames(summ.fit1$diagnostics)[1],
round(summ.fit1$diagnostics[1, "p-value"], 2),
round(summ.fit2$diagnostics[1, "p-value"], 2)),
c(rownames(summ.fit1$diagnostics)[2],
round(summ.fit1$diagnostics[2, "p-value"], 2),
round(summ.fit2$diagnostics[2, "p-value"], 2)) ))
Which will yield:
==========================================================
Dependent variable:
----------------------------
y
(1) (2)
----------------------------------------------------------
x -1.222 -0.912
(1.672) (1.002)
a -0.240 -0.208
(0.301) (0.243)
Constant 9.662 8.450**
(6.912) (4.222)
----------------------------------------------------------
Weak instruments 0.45 0.56
Wu-Hausman 0.11 0.18
Observations 100 100
R2 -4.414 -2.458
Adjusted R2 -4.526 -2.529
Residual Std. Error (df = 97) 22.075 17.641
==========================================================
Note: *p<0.1; **p<0.05; ***p<0.01
As you can see, this allows manually including the diagnostics in the respective models.
You could automate this approach by creating a function that takes in a list of models (e.g. list(summ.fit1, summ.fit2)) and outputs the objects required by se or add.lines arguments.
gaze.coeft <- function(x, col="Std. Error"){
stopifnot(is.list(x))
out <- lapply(x, function(y){
y[ , col]
})
return(out)
}
gaze.coeft(list(rob.fit1, rob.fit2))
gaze.coeft(list(rob.fit1, rob.fit2), col=2)
Will both take in a list of coeftest objects, and yield the SEs vector as expected by se:
[[1]]
(Intercept) x a
6.9124587 1.6716076 0.3011226
[[2]]
(Intercept) x a
4.2221491 1.0016012 0.2434801
Same can be done for the diagnostics:
gaze.lines.ivreg.diagn <- function(x, col="p-value", row=1:3, digits=2){
stopifnot(is.list(x))
out <- lapply(x, function(y){
stopifnot(class(y)=="summary.ivreg")
y$diagnostics[row, col, drop=FALSE]
})
out <- as.list(data.frame(t(as.data.frame(out)), check.names = FALSE))
for(i in 1:length(out)){
out[[i]] <- c(names(out)[i], round(out[[i]], digits=digits))
}
return(out)
}
gaze.lines.ivreg.diagn(list(summ.fit1, summ.fit2), row=1:2)
gaze.lines.ivreg.diagn(list(summ.fit1, summ.fit2), col=4, row=1:2, digits=2)
Both calls will yield:
$`Weak instruments`
[1] "Weak instruments" "0.45" "0.56"
$`Wu-Hausman`
[1] "Wu-Hausman" "0.11" "0.18"
Now the stargazer() call becomes as simple as this, yielding identical output as above:
stargazer(fit1, fit2, type = "text",
se = gaze.coeft(list(rob.fit1, rob.fit2)),
add.lines = gaze.lines.ivreg.diagn(list(summ.fit1, summ.fit2), row=1:2))

Multiply coefficients with standard deviation

In R, the stargazer package offers the possibility to apply functions to the coefficients, standard errors, etc:
dat <- read.dta("http://www.ats.ucla.edu/stat/stata/dae/nb_data.dta")
dat <- within(dat, {
prog <- factor(prog, levels = 1:3, labels = c("General", "Academic", "Vocational"))
id <- factor(id)
})
m1 <- glm.nb(daysabs ~ math + prog, data = dat)
transform_coef <- function(x) (exp(x) - 1)
stargazer(m1, apply.coef=transform_coef)
How can I apply a function where the factor with which I multiply depends on the variable, like the standard deviation of that variable?
This may not be exactly what you hoped for, but you can transform the coefficients, and give stargazer a custom list of coefficients. For example, if you would like to report the coefficient times the standard deviation of each variable, the following extension of your example could work:
library(foreign)
library(stargazer)
library(MASS)
dat <- read.dta("http://www.ats.ucla.edu/stat/stata/dae/nb_data.dta")
dat <- within(dat, {
prog <- factor(prog, levels = 1:3, labels = c("General", "Academic", "Vocational"))
id <- factor(id)
})
m1 <- glm.nb(daysabs ~ math + prog, data = dat)
# Store coefficients (and other coefficient stats)
s1 <- summary(m1)$coefficients
# Calculate standard deviations (using zero for the constant)
math.sd <- sd(dat$math)
acad.sd <- sd(as.numeric(dat$prog == "Academic"))
voc.sd <- sd(as.numeric(dat$prog == "Vocational"))
int.sd <- 0
# Append standard deviations to stored coefficients
StdDev <- c(int.sd, math.sd, acad.sd, voc.sd)
s1 <- cbind(s1, StdDev)
# Store custom list
new.coef <- s1[ , "Estimate"] * s1[ , "StdDev"]
# Output
stargazer(m1, coef = list(new.coef))
You may want to consider a couple of issues outside your original question about outputting coefficients in stargazer. Should you report the intercept when multiplying times the standard deviation? Will your standard errors and inference be the same with this transformation?

Resources