This seems like a very basic problem, but I have not been able to find a solution. I essentially wish to run a linear regression in a for loop and store the model coefficients (and standard errors if possible) for each iteration in an csv file.
For reference, I am running Fama-MacBeth regressions on macroeconomic "shocks," (the residuals of macroeconomic factors regressed on their lagged values).
My code for the loop is as follows
for (i in 7:69){
model <- lm(data = data, data[[i]]~TM2R+IPR+InfR+UnR+OilR)
#Model coefficients
print(model$coefficients)
#Standard Errors in regression results
model$vcov <- vcovHC(model, type = "HC1")
print(model$vcov)
}
You can use the broom package to transform the output from lm() to a data frame, then append the results from vconvHC() to it.
Finally, you export as csv.
library(broom)
library(sandwich)
for (i in 7:69){
model_name <- paste0("model_", i, ".csv")
model_i <- lm(data = data, data[[i]]~TM2R+IPR+InfR+UnR+OilR)
tidy_model <- tidy(model_i)
tidy_model$vcov <- vcovHC(model_i, type = "HC1")
write.csv(tidy_model, file = model_name)
}
Related
I have used a for loop to run a series of GAMs in R that regress a series of dependent variables on the same set of independent variables. I want to extract the p.table values from each model, but when I print the p.table objects from my list of model summaries, the p-values are absurdly long (~100 digits), and I cannot figure out how to apply a function to just that component of the p.table output while also printing the whole output.
Here is an example with mtcars. These model results are obviously meaningless; in this case, the p-values are printing fine, but in my data the p-values are way too long, and I want to truncate them in the printed output using, e.g., format.pval.
data(mtcars)
library(mgcv)
y_vars <- c("qsec", "wt", "hp")
models <- list()
for (i in y_vars){
models[[i]] <- gam(as.formula(paste(i, "~ cyl + s(drat) + am + gear + carb")),
method = "REML", data = mtcars)
}
models_summ <- lapply(models, summary)
lapply(models_summ, '[[', 'p.table')
I ended up assigning the output to a data frame and operating with it that way:
df <- data.frame(lapply(models_summ, '[[', 'p.table'))
If anyone has more elegant solutions, I would love to see them.
I am running multiple linear regression models and I would like to loop through the results to subsequently generate robust standard errors. My code currently looks like this, but I want to run multiple models and not have to copy the code for calculating robust standard errors for each model.
# load data
data(mtcars)
# run models
m1 <- lm("mpg ~ wt", data = mtcars)
m2 <- lm("mpg ~ wt + hp", data = mtcars)
# calculate robust standard errors
cov1 <- vcovHC(m1, type = "HC3")
robust_se1 <- sqrt(diag(cov1))
cov2 <- vcovHC(m2, type = "HC3")
robust_se2 <- sqrt(diag(cov2))
How could I write a function to handle this task. I plan to number each model using successive integers, e.g., m1, m2, m3. I have not so far been able to adapt related SO answers for generating variables using a loop, like this one.
Edits: changed to executable code.
I am trying to streamline my code to avoid for loops but am having a hard time once I run my cox proportional hazards code to extract p-values and standard errors for the coefficients. My code is as follows:
library(survival)
#Generate Data
x = matrix(rbinom(10000,1,.5),ncol=100)
y = rexp(ncol(x),.02)
censor = rbinom(ncol(x),1,.5)
event = ifelse(censor==1,0,1)
#Fit the coxph model to the data
ans = apply(x,1,function(x,y,event)coxph(Surv(y,event)~x),y=y,event=event)
#Extract the coefficients from ans
coef = unname(sapply(ans,function(x)x$coef))
So as you can see, I am able to extract the coefficients from the object ans, but I cannot extract the p-values and standard errors. Is there an easy way to do this from my ans object? Or a simple way to modify this code to do it?
You can just add these two lines of code to get the p-values and the std errors.
pValues <- sapply(1:length(ans), function(x) {summary(ans[[x]])$coefficients[5]})
sd <- sapply(1:length(ans), function(x) {summary(ans[[x]])$coefficients[3]})
Does anyone know how to get stargazer to display clustered SEs for lm models? (And the corresponding F-test?) If possible, I'd like to follow an approach similar to computing heteroskedasticity-robust SEs with sandwich and popping them into stargazer as in http://jakeruss.com/cheatsheets/stargazer.html#robust-standard-errors-replicating-statas-robust-option.
I'm using lm to get my regression models, and I'm clustering by firm (a factor variable that I'm not including in the regression models). I also have a bunch of NA values, which makes me think multiwayvcov is going to be the best package (see the bottom of landroni's answer here - Double clustered standard errors for panel data - and also https://sites.google.com/site/npgraham1/research/code)? Note that I do not want to use plm.
Edit: I think I found a solution using the multiwayvcov package...
library(lmtest) # load packages
library(multiwayvcov)
data(petersen) # load data
petersen$z <- petersen$y + 0.35 # create new variable
ols1 <- lm(y ~ x, data = petersen) # create models
ols2 <- lm(y ~ x + z, data = petersen)
cl.cov1 <- cluster.vcov(ols1, data$firmid) # cluster-robust SEs for ols1
cl.robust.se.1 <- sqrt(diag(cl.cov1))
cl.wald1 <- waldtest(ols1, vcov = cl.cov1)
cl.cov2 <- cluster.vcov(ols2, data$ticker) # cluster-robust SEs for ols2
cl.robust.se.2 <- sqrt(diag(cl.cov2))
cl.wald2 <- waldtest(ols2, vcov = cl.cov2)
stargazer(ols1, ols2, se=list(cl.robust.se.1, cl.robust.se.2), type = "text") # create table in stargazer
Only downside of this approach is you have to manually re-enter the F-stats from the waldtest() output for each model.
Using the packages lmtest and multiwayvcov causes a lot of unnecessary overhead. The easiest way to compute clustered standard errors in R is the modified summary() function. This function allows you to add an additional parameter, called cluster, to the conventional summary() function. The following post describes how to use this function to compute clustered standard errors in R:
https://economictheoryblog.com/2016/12/13/clustered-standard-errors-in-r/
You can easily the summary function to obtain clustered standard errors and add them to the stargazer output. Based on your example you could simply use the following code:
# estimate models
ols1 <- lm(y ~ x)
# summary with cluster-robust SEs
summary(ols1, cluster="cluster_id")
# create table in stargazer
stargazer(ols1, se=list(coef(summary(ols1,cluster = c("cluster_id")))[, 2]), type = "text")
I would recommend lfe package, which is much more powerful package than lm package. You can easily specify the cluster in the regression model:
ols1 <- felm(y ~ x + z|0|0|firmid, data = petersen)
summary(ols1)
stargazer(OLS1, type="html")
The clustered standard errors will be automatically produced. And stargazer will report the clustered-standard error accordingly.
By the way (allow me to do more marketing), for micro-econometric analysis, felm is highly recommended. You can specify fixed effects and IV easily using felm. The grammar is like:
ols1 <- felm(y ~ x + z|FixedEffect1 + FixedEffect2 | IV | Cluster, data = Data)
I wonder if I can use such as for loop or apply function to do the linear regression in R. I have a data frame containing variables such as crim, rm, ad, wd. I want to do simple linear regression of crim on each of other variable.
Thank you!
If you really want to do this, it's pretty trivial with lapply(), where we use it to "loop" over the other columns of df. A custom function takes each variable in turn as x and fits a model for that covariate.
df <- data.frame(crim = rnorm(20), rm = rnorm(20), ad = rnorm(20), wd = rnorm(20))
mods <- lapply(df[, -1], function(x, dat) lm(crim ~ x, data = dat))
mods is now a list of lm objects. The names of mods contains the names of the covariate used to fit the model. The main negative of this is that all the models are fitted using a variable x. More effort could probably solve this, but I doubt that effort is worth the time.
If you are just selecting models, which may be dubious, there are other ways to achieve this. For example via the leaps package and its regsubsets function:
library("leapls")
a <- regsubsets(crim ~ ., data = df, nvmax = 1, nbest = ncol(df) - 1)
summa <- summary(a)
Then plot(a) will show which of the models is "best", for example.
Original
If I understand what you want (crim is a covariate and the other variables are the responses you want to predict/model using crim), then you don't need a loop. You can do this using a matrix response in a standard lm().
Using some dummy data:
df <- data.frame(crim = rnorm(20), rm = rnorm(20), ad = rnorm(20), wd = rnorm(20))
we create a matrix or multivariate response via cbind(), passing it the three response variables we're interested in. The remaining parts of the call to lm are entirely the same as for a univariate response:
mods <- lm(cbind(rm, ad, wd) ~ crim, data = df)
mods
> mods
Call:
lm(formula = cbind(rm, ad, wd) ~ crim, data = df)
Coefficients:
rm ad wd
(Intercept) -0.12026 -0.47653 -0.26419
crim -0.26548 0.07145 0.68426
The summary() method produces a standard summary.lm output for each of the responses.
Suppose you want to have response variable fix as first column of your data frame and you want to run simple linear regression multiple times individually with other variable keeping first variable fix as response variable.
h=iris[,-5]
for (j in 2:ncol(h)){
assign(paste("a", j, sep = ""),lm(h[,1]~h[,j]))
}
Above is the code which will create multiple list of regression output and store it in a2,a3,....