Including t statistics in regression output using stargazer in R - r

I have three regressions that I am trying to include into one table using the -stargazer- function. I have the following code:
library(Jmisc)
library(tidyverse)
library(sandwich)
library(lmtest)
library(multiwayvcov)
library(stargazer)
set.seed(123)
df <- data.frame(
x1 = rnorm(10, mean=0, sd=1),
x2 = rnorm(10, mean=0, sd=1),
y = rnorm(10, mean=0, sd=1)
)
r1 <- lm(y ~ x1 + x2, df)
cov1 <- vcovHC(r1, type="HC1", cluster="clustervar")
robust.se1 <- sqrt(diag(cov1))
t1 <- coef(r1)/robust.se1
r2 <- lm(y ~ x1, df)
cov2 <- vcovHC(r2, type="HC1", cluster="clustervar")
robust.se2 <- sqrt(diag(cov2))
t2 <- coef(r2)/robust.se2
r3 <- lm(y ~ x2, df)
cov3 <- vcovHC(r3, type="HC1", cluster="clustervar")
robust.se3 <- sqrt(diag(cov3))
t3 <- coef(r3)/robust.se2
stargazer(r1, r2, r3,
se = NULL,
t = list(t1, t2, t3),
align=TRUE,
type="html",
nobs=TRUE,
out="StargazerTest.txt")
The table that is produced reports standard errors as opposed to the t-statistics I created. This is most likely due to the -stargazer- function at the bottom. I have looked up the directory for it and still don't understand how to get it to do what I want.

As explained here, you can specify the values you want with report (since stargazer 5.0). In your case, remove se = NULL and t = list(t1, t2, t3) and put:
report = ('c*t')
such as:
stargazer(r1, r2, r3,
report = ('c*t'),
align=TRUE,
type="html",
nobs=TRUE,
out="StargazerTest.txt")
Edit: since you need to use the robust standard error, you should use the function coeftest (library lmtest) instead of computing the robust standard error manually. Below is an example on one of your regressions:
library(Jmisc)
library(tidyverse)
library(sandwich)
library(lmtest)
library(multiwayvcov)
library(stargazer)
set.seed(123)
df <- data.frame(
x1 = rnorm(10, mean=0, sd=1),
x2 = rnorm(10, mean=0, sd=1),
y = rnorm(10, mean=0, sd=1)
)
r1 <- lm(y ~ x1 + x2, df)
cov1 <- vcovHC(r1, type="HC1", cluster="clustervar")
robust.se1 <- sqrt(diag(cov1))
t1 <- coef(r1)/robust.se1
foo <- coeftest(r1, vcov = vcovHC(r1, type = "HC1"))
stargazer(foo,
report = ('c*t'),
align=TRUE,
type="html",
nobs=TRUE,
out="StargazerTest.txt")
Notice that foo gives the same t-values than t1 but also displays coefficients, se, etc. which allows stargazer to work properly

Related

How to have output from lm() include std. error and others without using summary() for stargazer

I'm fitting several linear models in r in the following way:
set.seed(12345)
n = 100
x1 = rnorm(n)
x2 = rnorm(n)+0.1
y = x + rnorm(n)
df <- data.frame(x1, x2, y)
x_str <- c("x1", "x1+x2")
regf_lm <- function(df,y_var, x_str ) {
frmla <- formula(paste0(y_var," ~ ", x_str ))
fit <- lm(frmla, data = df )
summary(fit) #fit
}
gbind_lm <- function(vv) {
n <- vv %>% length()
fits <- list()
coefs <- list()
ses <- list()
for (i in 1:n ) {
coefs[[i]] <- vv[[i]]$coefficients[,1]
ses[[i]] <- vv[[i]]$coefficients[,2]
fits[[i]] <- vv[[i]]
}
list("fits" = fits, "coefs" = coefs, "ses" = ses)
}
stargazer_lm <- function(mylist, fname, title_str,m_type = "html",...) {
stargazer(mylist$fits, coef = mylist$coefs,
se = mylist$ses,
type = m_type, title = title_str,
out = paste0("~/projects/outputs",fname), single.row = T ,...)
}
p_2 <- map(x_str,
~ regf_lm (df = df ,
y_var = "y", x_str = .))
m_all <- do.call(c, list(p_2)) %>% gbind_lm()
stargazer_lm(m_all,"name.html","My model", m_type = "html")
In regf_lm, if I use summary(fit) on the last line, I'm able to generate reg output with columns for estimated coefficients, std. error, etc. But Stargazer() does not work with summary(lm()) (returns error $ operator is invalid for atomic vectors). However, if I just use "fit" on the last line in regf_lm, the output shows only the estimated coefficients and not std error, R sq...and gbind_lm() won't work because I cannot extract ses or fit.
Any advice is greatly appreciated.
You can directly export model statistics in tidy format with the package broom
library(broom)
set.seed(12345)
n = 100
x1 = rnorm(n)
x2 = rnorm(n)+0.1
y = x1 + rnorm(n)
df <- data.frame(x1, x2, y)
x_str <- c("x1", "x1+x2")
regf_lm <- function(df,y_var, x_str ) {
frmla <- formula(paste0(y_var," ~ ", x_str ))
fit <- lm(frmla, data = df )
return(list(fit,select(broom::tidy(fit),std.error))) #fit
}
exm_model <- regf_lm(iris,'Sepal.Width','Sepal.Length')
stargazer(exm_model[[1]], coef = exm_model[[2]], title = 'x_model',
out ='abc', single.row = T)
This piece of code worked on my local with no problem, I think you can apply this in your workflow.

Linear regression with ongoing data, in R

Modell
y ~ x1 + x2 + x3
about 1000 rows
What Iwant to do is to do an prediction "step-by-step"
Using Row 0:20 to predict y of 21:30 and then using 11:30 to predict y of 31:40 and so on.
You can use the predict function:
mod = lm(y ~ ., data=df[1:990,])
pred = predict(mod, newdata=df[991:1000,2:4])
Edit: to change the range of training data in a loop:
index = seq(10,990,10)
pred = matrix(nrow=10, ncol=length(index))
for(i in index){
mod = lm(y ~ ., data=df[1:i,])
pred[,i/10] = predict(mod, newdata=df[(i+1):(i+10),2:4])
MSE[i/10] = sum((df$y[(i+1):(i+10)]-pred[,i/10])^2)}
mean(MSE)
Are you looking for something like this?
# set up mock data
set.seed(1)
df <- data.frame(y = rnorm(1000),
x1 = rnorm(1000),
x2 = rnorm(1000),
x3 = rnorm(1000))
# for loop
prd <- list()
for(i in 1:970){
# training data
trn <- df[i:(i+20), ]
# test data
tst <- df[(i+21):(i+30), ]
# lm model
mdl <- lm(y ~ x1 + x2 + x3, trn)
# append a list of data.frame with both predicted and actual values
# for later confrontation
prd[[i]] <- data.frame(prd = predict(mdl, tst[-1]),
act = tst[[1]])
}
# your list
prd
You can also try something fancier with the package slider:
# define here your model and how you wanna handle the preditions
sliding_lm <- function(..., frm, n_trn, n_tst){
df <- data.frame(...)
trn <- df[1:n_trn, ]
tst <- df[n_trn+1:n_tst, ]
mdl <- lm(y ~ x1 + x2 + x3, trn)
data.frame(prd = predict(mdl, tst[-1]),
act = tst[[1]])
}
n_trn <- 20 # number of training obs
n_tst <- 10 # number of test obs
frm <- y ~ x1 + x2 + x3 # formula of your model
prd <- slider::pslide(df, sliding_lm,
frm = frm,
n_trn = n_trn,
n_tst = n_tst,
.after = n_trn + n_tst,
.complete = TRUE)
Note that the last 30 entries in the list are NULL, because you look only at complete windows [30 observations with training and test]

How to calculate lambda.1se from iterative cross validation?

I have the following code to choose a value of lambda based on the lowest resulting mean squared error (MSE) after iterated cross validation.
library(glmnet)
set.seed(3)
IV1 <- data.frame(IV1 = rnorm(100))
IV2 <- data.frame(IV2 = rnorm(100))
IV3 <- data.frame(IV3 = rnorm(100))
IV4 <- data.frame(IV4 = rnorm(100))
IV5 <- data.frame(IV5 = rnorm(100))
DV <- data.frame(DV = rnorm(100))
data <- data.frame(IV1,IV2,IV3,IV4,IV5,DV)
x <- model.matrix(DV~.-IV5 , data)[ , -1]
y <- data$DV
AB <- glmnet(x=x, y=y, alpha=1)
plot(AB,xvar="lambda")
lambdas <- NULL
for (i in 1:100){
fit <- cv.glmnet(x, y)
errors <- data.frame(fit$lambda, fit$cvm)
lambdas <- rbind(lambdas, errors)
}
lambdas <- aggregate(lambdas[ , 2], list(lambdas$fit.lambda), mean)
bestindex <- which(lambdas[2]== min(lambdas[2]))
bestlambda <- lambdas[bestindex,1]
How would I modify this to select lambda.1se (i.e., the largest λ at which the MSE is within one standard error of the minimal MSE)?
Edit:
How about this
lambdas <- NULL #initialize
n.fits <- 100
for (i in 1:n.fits) {
{
fit <- cv.glmnet(x,y)
errors = data.frame(fit$lambda,fit$cvm)
lambdas <- rbind(lambdas,errors)
r2[i]<-max(1-fit$cvm/var(y))
}
# take mean cvm for each lambda
lambdas <- aggregate(lambdas[, 2], list(lambdas$fit.lambda), mean)
lambdas<-as.data.frame(lambdas)
# find subset with mse within 1 se of mean
onese<-std.error(lambdas[2])
min<-min(lambdas[2])
low<-min-onese
high<-min+onese
lambdas<-subset(lambdas, x>low)
lambdas<-subset(lambdas, x<high)
#choose highest lambda among those
bestindex = which(lambdas[1]==max(lambdas[1]))
bestlambda = lambdas[bestindex,1]
If you decide to use cv.glmnet, the following might be what you are looking for. (p.s. I also cleaned up your simulation code a bit; note that I also didn't make use of the AB object from glmnet which is obviously not the same as cv.glmnet)
library(glmnet)
## Simulate data:
set.seed(3)
x <- data.frame(
IV1 = rnorm(100),
IV2 = rnorm(100),
IV3 = rnorm(100),
IV4 = rnorm(100),
IV5 = rnorm(100)
)
x <- as.matrix(x)
y <- rnorm(100) #target or response
## Iteratively fit models
lambdas <- NULL #initialize
n.fits <- 100
for (i in 1:n.fits) {
fit <- cv.glmnet(x, y, family="gaussian")
df <- data.frame(fit$lambda.1se, mean(fit$cvm) ) #can use median for CVM also
lambdas <- rbind(lambdas, df)
}
## Select best lambda:
bestindex <- which.min(lambdas[ , 2]) #the way you had it was way too complicated
bestlambda <- lambdas[bestindex, 1]
bestlambda

How to export all coefficients of a penlized regression model from package `penalized`? Need them for reporting rolling regression estimate

I have been able to run regression with some coefficients constrained to positive territory, but I'm doing alot of rolling regressions where I face the problem. Here is my sample code:
library(penalized)
set.seed(1)
x1=rnorm(100)*10
x2=rnorm(100)*10
x3=rnorm(100)*10
y=sin(x1)+cos(x2)-x3+rnorm(100)
data <- data.frame(y, x1, x2, x3)
win <- 10
coefs <- matrix(NA, ncol=4, nrow=length(y))
for(i in 1:(length(y)-win)) {
d <- data[(1+i):(win+i),]
p <- win+i
# Linear Regression
coefs[p,] <- as.vector(coef(penalized(y, ~ x1 + x2 + x3, ~1,
lambda1=0, lambda2=0, positive = c(F, F, T), data=data)))}
This is how I usually populate matrix with coefs from rolling regression and now I receive error:
Error in coefs[p, ] <- as.vector(coef(penalized(y, ~x1 + x2 + x3, ~1, :
number of items to replace is not a multiple of replacement length
I assume that this error is produced because there is not always Intercept + 3 coefficients coming out of that penalized regression function. Is there away to get penalized function to show 0 coefs as well? or other way to populated matrix / data.frame?
Perhaps you are unaware of the which argument for coef for "penfit" object. Have a look at:
getMethod(coef, "penfit")
#function (object, ...)
#{
# .local <- function (object, which = c("nonzero", "all", "penalized",
# "unpenalized"), standardize = FALSE)
# {
# coefficients(object, which, standardize)
# }
# .local(object, ...)
#}
#<environment: namespace:penalized>
We can set which = "all" to report all coefficients. The default is which = "nonzero", which is causing the "replacement length differs" issue.
The following works:
library(penalized)
set.seed(1)
x1 = rnorm(100)*10
x2 = rnorm(100)*10
x3 = rnorm(100)*10
y = sin(x1) + cos(x2) - x3 + rnorm(100)
data <- data.frame(y, x1, x2, x3)
win <- 10
coefs <- matrix(NA, ncol=4, nrow=length(y))
for(i in 1:(length(y)-win)) {
d <- data[(1+i):(win+i),]
p <- win + i
pen <- penalized(y, ~ x1 + x2 + x3, ~1, lambda1 = 0, lambda2 = 0,
positive = c(F, F, T), data = data)
beta <- coef(pen, which = "all")
coefs[p,] <- unname(beta)
}

Creating function arguments from a named list (with an application to stats4::mle)

I should start by saying what I'm trying to do: I want to use the mle function without having to re-write my log likelihood function each time I want to try a different model specification. Because mle is expecting a named list of starting values, you apparently cannot just write the log-likelihood function as taking a vector of parameters. A simple example:
Suppose I want to fit a linear regression model via maximum likelihood and at first, I'm ignoring one of my predictors:
n <- 100
df <- data.frame(x1 = runif(n), x2 = runif(n), y = runif(n))
Y <- df$y
X <- model.matrix(lm(y ~ x1, data = df))
# define log-likelihood function
ll <- function(beta0, beta1, sigma){
beta = matrix(NA, nrow=2, ncol=1)
beta[,1] = c(beta0, beta1)
-sum(log(dnorm(Y - X %*% beta, 0, sigma)))
}
library(stats4)
mle(ll, start = list(beta0=.1, beta1=.2, sigma=1)
Now, if I want to fit a different model, say:
m <- lm(y ~ x1 + x2, data = df)
I cannot re-use my log-likelihood function--I'd have to re-write it to have the beta3 parameter. What I'd like to do is something like:
ll.flex <- function(theta){
# theta is a vector that I can use directly
...
}
if I could then somehow adjust the start argument in mle to account for my now vector-input log-likelihood function, or barring that, have a function that constructs the log-likelihood function at run-time, say by constructing the named list of arguments and then using it to define the function e.g., something like this:
X <- model.matrix(lm(y ~ x1 + x2, data = df))
arguments <- rep(NA, dim(X)[2])
names(arguments) <- colnames(X)
ll.magic <- function(bring.this.to.life.as.function.arguments(arguments)){...}
Update:
I ended up writing a helper function that can add an arbitrary number of named arguments x1, x2, x3... to a passed function f.
add.arguments <- function(f,n){
# adds n arguments to a function f; returns that new function
t = paste("arg <- alist(",
paste(sapply(1:n, function(i) paste("x",i, "=",sep="")), collapse=","),
")", sep="")
formals(f) <- eval(parse(text=t))
f
}
It's ugly, but it got the job done, letting me re-factor my log-likelihood function on the fly.
You can use the mle2 function from the package bbmle which allows you to pass vectors as parameters. Here is some sample code.
# REDEFINE LOG LIKELIHOOD
ll2 = function(params){
beta = matrix(NA, nrow = length(params) - 1, ncol = 1)
beta[,1] = params[-length(params)]
sigma = params[[length(params)]]
minusll = -sum(log(dnorm(Y - X %*% beta, 0, sigma)))
return(minusll)
}
# REGRESS Y ON X1
X <- model.matrix(lm(y ~ x1, data = df))
mle2(ll2, start = c(beta0 = 0.1, beta1 = 0.2, sigma = 1),
vecpar = TRUE, parnames = c('beta0', 'beta1', 'sigma'))
# REGRESS Y ON X1 + X2
X <- model.matrix(lm(y ~ x1 + x2, data = df))
mle2(ll2, start = c(beta0 = 0.1, beta1 = 0.2, beta2 = 0.1, sigma = 1),
vecpar = TRUE, parnames = c('beta0', 'beta1', 'beta2', 'sigma'))
This gives you
Call:
mle2(minuslogl = ll2, start = c(beta0 = 0.1, beta1 = 0.2, beta2 = 0.1,
sigma = 1), vecpar = TRUE, parnames = c("beta0", "beta1",
"beta2", "sigma"))
Coefficients:
beta0 beta1 beta2 sigma
0.5526946 -0.2374106 0.1277266 0.2861055
It might be easier to use optim directly; that's what mle is using anyway.
ll2 <- function(par, X, Y){
beta <- matrix(c(par[-1]), ncol=1)
-sum(log(dnorm(Y - X %*% beta, 0, par[1])))
}
getp <- function(X, sigma=1, beta=0.1) {
p <- c(sigma, rep(beta, ncol(X)))
names(p) <- c("sigma", paste("beta", 0:(ncol(X)-1), sep=""))
p
}
set.seed(5)
n <- 100
df <- data.frame(x1 = runif(n), x2 = runif(n), y = runif(n))
Y <- df$y
X1 <- model.matrix(y ~ x1, data = df)
X2 <- model.matrix(y ~ x1 + x2, data = df)
optim(getp(X1), ll2, X=X1, Y=Y)$par
optim(getp(X2), ll2, X=X2, Y=Y)$par
With the output of
> optim(getp(X1), ll2, X=X1, Y=Y)$par
sigma beta0 beta1
0.30506139 0.47607747 -0.04478441
> optim(getp(X2), ll2, X=X2, Y=Y)$par
sigma beta0 beta1 beta2
0.30114079 0.39452726 -0.06418481 0.17950760
It might not be what you're looking for, but I would do this as follows:
mle2(y ~ dnorm(mu, sigma),parameters=list(mu~x1 + x2), data = df,
start = list(mu = 1,sigma = 1))
mle2(y ~ dnorm(mu,sigma), parameters = list(mu ~ x1), data = df,
start = list(mu=1,sigma=1))
You might be able to adapt this formulation for a multinomial, although dmultinom might not work -- you might need to write a Dmultinom() that took a matrix of multinomial samples and returned a (log)probability.
The R code that Ramnath provided can also be applied to the optim function because
it takes vectors as parameters also.

Resources