I'm trying to make a list of lm object using purrr::map.
use mtcars as an example:
vars <- c('hp', 'wt', 'disp')
map(vars, ~lm(mpg~.x, data=mtcars))
error:
Error in model.frame.default(formula = mpg ~ .x, data = mtcars, drop.unused.levels = TRUE) : variable lengths differ (found for '.x')
I also tried:
map(vars, function(x) {x=sym(x); lm(mpg~!!x, data=mtcars)})
I got error message:
Error in !x : invalid argument type
Can anyone tell what I did wrong? Thanks in advance.
The usual way is to paste together formulas as strings, convert them by mapping as.formula (you can't make a vector of formulas; it has to be a list), and then map lm. You can combine it all to a single call if you like, but I've come to prefer mapping single functions, which makes code easier to read:
library(purrr)
c('hp', 'wt', 'disp') %>%
paste('mpg ~', .) %>%
map(as.formula) %>%
map(lm, data = mtcars)
#> [[1]]
#>
#> Call:
#> .f(formula = .x[[i]], data = ..1)
#>
#> Coefficients:
#> (Intercept) hp
#> 30.09886 -0.06823
#>
#>
#> [[2]]
#>
#> Call:
#> .f(formula = .x[[i]], data = ..1)
#>
#> Coefficients:
#> (Intercept) wt
#> 37.285 -5.344
#>
#>
#> [[3]]
#>
#> Call:
#> .f(formula = .x[[i]], data = ..1)
#>
#> Coefficients:
#> (Intercept) disp
#> 29.59985 -0.04122
It's actually unnecessary to call map(as.formula) as lm will coerce it into a formula, but not all models are so generous (e.g. mgcv::gam).
A downside of this approach are that the call listed in the object looks funky, but the coefficients tell you which is which easily enough anyway. A useful alternative is to keep the formula as a string in one column of a data.frame and the model in a list column, e.g.
library(tidyverse)
data_frame(formula = paste('mpg ~', c('hp', 'wt', 'disp')),
model = map(formula, lm, data = mtcars))
#> # A tibble: 3 x 2
#> formula model
#> <chr> <list>
#> 1 mpg ~ hp <S3: lm>
#> 2 mpg ~ wt <S3: lm>
#> 3 mpg ~ disp <S3: lm>
The elegant tidyverse approach demonstrated by #alistaire worked well for me until I tried to pass the list column to the stargazer package and received "% Error: Unrecognized object type."
In case it is helpful for anyone else trying to use purrr map and stargazer, this slight modification solved the issue:
models_out <- data_frame(
formula = paste('mpg ~', c('hp', 'wt', 'disp')),
model = map(
.x = formula,
.f = function(x) lm(x, data = mtcars))
)
stargazer(models_out$model, type = 'text')
===========================================================
Dependent variable:
-----------------------------
mpg
(1) (2) (3)
-----------------------------------------------------------
hp -0.068***
(0.010)
wt -5.344***
(0.559)
disp -0.041***
(0.005)
Constant 30.099*** 37.285*** 29.600***
(1.634) (1.878) (1.230)
-----------------------------------------------------------
Observations 32 32 32
R2 0.602 0.753 0.718
Adjusted R2 0.589 0.745 0.709
Residual Std. Error (df = 30) 3.863 3.046 3.251
F Statistic (df = 1; 30) 45.460*** 91.375*** 76.513***
===========================================================
Note: *p<0.1; **p<0.05; ***p<0.01
Related
I am having troubles with using subset which involves lists in my linear model. One of two things happen, when I use lapply I get object 'x' not found. When I use mapply I get that object 'y' not found. For some reason neither variable is working in subset.
Here are my two approaches:
library(faraway)
data(savings)
form <- "sr ~ pop75 + dpi + ddpi + pop15"
#approach one
lapply(label_res, function(x){
lm(form, savings, subset = (!row.names(savings) %in% x))
})
#output does not change for all linear models
#approach two
form_list<-rep(form, 4) %>% as.list()
mapply(function(x, y)
{lm(x, savings, subset=(!row.names(savings) %in% y))}, form_list, label_res)
>Error in row.names(savings) %in% y : object 'y' not found
However this does work:
library(faraway)
data(savings)
form <- "sr ~ pop75 + dpi + ddpi + pop15"
#approach one
lapply(label_res, function(x){subset(savings, !row.names(savings) %in% x) %>%
lm(form, .)
})
Here s label_res:
list(pop75 = c("Ireland", "Japan"), dpi = c("Ireland", "Sweden",
"United States"), ddpi = "Libya", pop15 = c("Japan", "Libya"))
This is due to the way that lm internally calls model.frame. It tries to evaluate the arguments in the parent frame, which means that it is trying to find an object called x in the global environment, when one doesn't exist. There are various ways round this. The easiest is to subset the data frame you pass to lm in the first place, which actually requires less typing than using subset anyway:
lapply(label_res, function(x) {
lm(form, savings[!row.names(savings) %in% x,])
})
Another, less satisfactory way if you really want to use the subset argument, is to write x into the calling environment with each iteration
lapply(label_res, function(x) {
x <<- x
lm(form, savings, subset = !row.names(savings) %in% x)
})
I prefer the first option since it doesn't have any side effects in the calling frame, but they both produce the same output:
#> $pop75
#>
#> Call:
#> lm(formula = form, data = savings[!row.names(savings) %in% x,
#> ])
#>
#> Coefficients:
#> (Intercept) pop75 dpi ddpi pop15
#> 26.0381140 -1.4257571 -0.0002792 0.3547081 -0.4078521
#>
#>
#> $dpi
#>
#> Call:
#> lm(formula = form, data = savings[!row.names(savings) %in% x,
#> ])
#>
#> Coefficients:
#> (Intercept) pop75 dpi ddpi pop15
#> 29.009000 -2.216356 0.000589 0.443409 -0.469873
#>
#>
#> $ddpi
#>
#> Call:
#> lm(formula = form, data = savings[!row.names(savings) %in% x,
#> ])
#>
#> Coefficients:
#> (Intercept) pop75 dpi ddpi pop15
#> 24.5240460 -1.2808669 -0.0003189 0.6102790 -0.3914401
#>
#>
#> $pop15
#>
#> Call:
#> lm(formula = form, data = savings[!row.names(savings) %in% x,
#> ])
#>
#> Coefficients:
#> (Intercept) pop75 dpi ddpi pop15
#> 20.580204 -0.644662 -0.000448 0.516628 -0.310588
Created on 2022-06-12 by the reprex package (v2.0.1)
Suppose in R I have multiple GLM objects from multiple glm() function calls.
glm_01
glm_02
...
glm_nn
...and suppose that I want to do all possible pairwise comparisons using a chi-squared or F ANOVA test.
anova(glm_01, glm_02, test = "F")
anova(glm_01, glm_03, test = "F")
anova(glm_01, glm_04, test = "F")
...
I don't want to do this manually because the list of models is quite long. Instead I'd like to grab a list of relevant model objects (anything starting with "glm_") and do all pairwise comparisons automatically. However I'm unsure how to pass the model objects (rather than their names in string form) to the anova() function.
As a simple example:
data(mtcars)
# create some models
glm_01 <- glm(mpg ~ cyl , mtcars, family = gaussian())
glm_02 <- glm(mpg ~ cyl + disp , mtcars, family = gaussian())
glm_03 <- glm(mpg ~ cyl + disp + hp , mtcars, family = gaussian())
glm_04 <- glm(mpg ~ cyl + disp + hp + wt, mtcars, family = gaussian())
# get list of relevant model objects from the R environment
model_list <- ls()
model_list <- model_list[substr(model_list, 1, 4) == "glm_"]
# create a table to store the pairwise ANOVA results
n_models <- length(model_list)
anova_table <- matrix(0, nrow = n_models, ncol = n_models)
# loop through twice and do pairwise comparisons
for(row_index in 1:n_models) {
for(col_index in 1:n_models) {
anova_table[row_index, col_index] <- anova(model_list[row_index], model_list[col_index], test = "F")$'Pr(>F)'[2]
}
}
...but of course this loop at the end doesn't work because I'm not passing model objects to anova(), I'm passing the names of the objects as strings instead. How do I tell anova() to use the object that the string refers to, instead of the string itself?
Thank you.
======================
Possible solution:
data(mtcars)
glm_list <- list()
glm_list$glm_01 <- glm(mpg ~ cyl , mtcars, family = gaussian())
glm_list$glm_02 <- glm(mpg ~ cyl + disp , mtcars, family = gaussian())
glm_list$glm_03 <- glm(mpg ~ cyl + disp + hp , mtcars, family = gaussian())
glm_list$glm_04 <- glm(mpg ~ cyl + disp + hp + wt, mtcars, family = gaussian())
# create a table to store the pairwise ANOVA results
n_models <- length(glm_list)
anova_table <- matrix(0, nrow = n_models, ncol = n_models)
# loop through twice and do pairwise comparisons
row_idx <- 0
col_idx <- 0
for(row_glm in glm_list)
{
row_idx <- row_idx + 1
for(col_glm in glm_list)
{
col_idx <- col_idx + 1
anova_table[row_idx, col_idx] <- anova(row_glm, col_glm, test = "F")$'Pr(>F)'[2]
}
col_idx <- 0
}
row_idx <- 0
The easiest way to do this would be to keep all your models in a list. This makes it simple to iterate over them. For example, you can create all of your models and do a pairwise comparison between all of them like this:
data(mtcars)
f_list <- list(mpg ~ cyl,
mpg ~ cyl + disp,
mpg ~ cyl + disp + hp,
mpg ~ cyl + disp + hp + wt)
all_glms <- lapply(f_list, glm, data = mtcars, family = gaussian)
all_pairs <- as.data.frame(combn(length(all_glms), 2))
result <- lapply(all_pairs, function(i) anova(all_glms[[i[1]]], all_glms[[i[2]]]))
Which gives you:
result
#> $V1
#> Analysis of Deviance Table
#>
#> Model 1: mpg ~ cyl
#> Model 2: mpg ~ cyl + disp
#> Resid. Df Resid. Dev Df Deviance
#> 1 30 308.33
#> 2 29 270.74 1 37.594
#>
#> $V2
#> Analysis of Deviance Table
#>
#> Model 1: mpg ~ cyl
#> Model 2: mpg ~ cyl + disp + hp
#> Resid. Df Resid. Dev Df Deviance
#> 1 30 308.33
#> 2 28 261.37 2 46.965
#>
#> $V3
#> Analysis of Deviance Table
#>
#> Model 1: mpg ~ cyl
#> Model 2: mpg ~ cyl + disp + hp + wt
#> Resid. Df Resid. Dev Df Deviance
#> 1 30 308.33
#> 2 27 170.44 3 137.89
#>
#> $V4
#> Analysis of Deviance Table
#>
#> Model 1: mpg ~ cyl + disp
#> Model 2: mpg ~ cyl + disp + hp
#> Resid. Df Resid. Dev Df Deviance
#> 1 29 270.74
#> 2 28 261.37 1 9.3709
#>
#> $V5
#> Analysis of Deviance Table
#>
#> Model 1: mpg ~ cyl + disp
#> Model 2: mpg ~ cyl + disp + hp + wt
#> Resid. Df Resid. Dev Df Deviance
#> 1 29 270.74
#> 2 27 170.44 2 100.3
#>
#> $V6
#> Analysis of Deviance Table
#>
#> Model 1: mpg ~ cyl + disp + hp
#> Model 2: mpg ~ cyl + disp + hp + wt
#> Resid. Df Resid. Dev Df Deviance
#> 1 28 261.37
#> 2 27 170.44 1 90.925
Created on 2020-08-25 by the reprex package (v0.3.0)
If you want to reference arbitrary objects in an accessible environment by symbol without putting them into a list object, the standard way to return the top object on the search list whose symbol is equal to a string is get(), or the vector equivalent mget(). I.e. get("glm_01") gets you the top object on the search list that has the symbol glm_01. The most minimal modification to your approach would be to wrap your calls to model_list[row_index] and model_list[col_index] in get().
You can be more precise about where to look for objects by assigning the models in a named environment and only getting from that environment (using the envir parameter to get()).
I'm using the moderndrive package to calculate a linear regression but using a function. I am trying to create a function where i can just pass in two selected columns(e.g deaths & cases, titles of the columns) from my data frame (Rona_2020). Below is the function...
score_model_Fxn <- function(y, x){
score_mod <- lm(y ~ x, data = Rona_2020)
Reg_Table <- get_regression_table(score_mod)
print(paste('The regression table is', Reg_Table))
}
when I run the function ...
score_model_Fxn(deaths, cases)
I get ...
Error in eval(predvars, data, env) : object 'deaths' not found
What should i do? I have looked several similar issues but to no avail.
What you want to do by passing deaths and cases is called non-standard evaluation. You need to combine this with computing on the language if you want to run a model with the correct formula and scoping. Computing on the language can be done with substitute and bquote.
library(moderndive)
score_model_Fxn <- function(y, x, data){
#get the symbols passed as arguments:
data <- substitute(data)
y <- substitute(y)
x <- substitute(x)
#substitute them into the lm call and evaluate the call:
score_mod <- eval(bquote(lm(.(y) ~ .(x), data = .(data))))
Reg_Table <- get_regression_table(score_mod)
message('The regression table is') #better than your paste solution
print(Reg_Table)
invisible(score_mod) #a function should always return something useful
}
mod <- score_model_Fxn(Sepal.Length, Sepal.Width, iris)
#The regression table is
## A tibble: 2 x 7
# term estimate std_error statistic p_value lower_ci upper_ci
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 intercept 6.53 0.479 13.6 0 5.58 7.47
#2 Sepal.Width -0.223 0.155 -1.44 0.152 -0.53 0.083
print(mod)
#
#Call:
#lm(formula = Sepal.Length ~ Sepal.Width, data = iris)
#
#Coefficients:
#(Intercept) Sepal.Width
# 6.5262 -0.2234
You could have the function return Reg_Table instead if you prefer.
One of the coolest ways of doing this is using the new recipes package to generate the formula for us and then manipulating a tibble to produce or result
library(tidyverse)
library(recipes)
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stringr':
#>
#> fixed
#> The following object is masked from 'package:stats':
#>
#> step
library(moderndive)
score_model_Fxn <- function(df,x, y){
formula_1 <- df %>%
recipe() %>%
update_role({{x}},new_role = "outcome") %>%
update_role({{y}},new_role = "predictor") %>%
formula()
Reg_Table <- mtcars %>%
summarise(score_mod = list(lm(formula_1,data = .))) %>%
rowwise() %>%
mutate(Reg_Table = list(get_regression_table(score_mod))) %>%
pull(Reg_Table)
print(paste('The regression table is', Reg_Table))
Reg_Table
}
k <- mtcars %>%
score_model_Fxn(x = cyl,y = gear)
#> [1] "The regression table is list(term = c(\"intercept\", \"gear\"), estimate = c(10.585, -1.193), std_error = c(1.445, 0.385), statistic = c(7.324, -3.101), p_value = c(0, 0.004), lower_ci = c(7.633, -1.978), upper_ci = c(13.537, -0.407))"
k
#> [[1]]
#> # A tibble: 2 x 7
#> term estimate std_error statistic p_value lower_ci upper_ci
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 intercept 10.6 1.44 7.32 0 7.63 13.5
#> 2 gear -1.19 0.385 -3.10 0.004 -1.98 -0.407
Created on 2020-06-09 by the reprex package (v0.3.0)
For those that might be interested...I modified Bruno's answer.
library(tidyverse); library(recipes); library(moderndive)
score_model_Fxn2 <- function(df,x, y){
formula_1 <- df %>%
recipe() %>%
update_role({{y}},new_role = "outcome") %>%
update_role({{x}},new_role = "predictor") %>%
formula()
Reg_Table <- df %>%
summarise(score_mod = list(lm(formula_1,data = .))) %>%
rowwise() %>%
mutate(Reg_Table = list(get_regression_table(score_mod))) %>%
pull(Reg_Table)
print(Reg_Table)
}
score_model_Fxn2()
Does anyone know how to change to the format of t-stats in stargazer? I tried a bunch of things but haven't had any luck.
I would like the t-statistics shown below the coefficient and in brackets? i.e. drop the "t =" and replace with the t-statistic being shown in inside ( xxxx)
For example:
(1)
Variable 1 0.102
t = 3.494
I would like
(1)
Variable 1 0.102
(3.494)
If you are interested in a non-stargazer option, you might want to try the modelsummary package (disclaimer: I am the author). modelsummary accepts strings enclosed in curly braces in the glue package format, so you can do a bunch of weird things. You can read the details at this link, but here's an example:
library(modelsummary)
models <- list(
lm(hp ~ mpg, data = mtcars),
lm(hp ~ mpg + drat, data = mtcars))
modelsummary(models,
statistic = c(
"statistic",
"conf.int",
"Std. Error: {std.error}{stars}"))
There is no real option to do this with stargazer as the format of t-statistics is hard-coded.
Instead, replace the standard errors with the t-statistics and override p-values so the right stars appear.
I did this for multiple models below, as this is the more general solution (it works for one model too).
library(stargazer)
#>
#> Please cite as:
#> Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
#> R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
models <- list()
models[[1]] <- lm(mpg ~ cyl + disp, data = mtcars)
models[[2]] <- lm(mpg ~ cyl + disp + wt, data = mtcars)
get_ts <- function(fm) {
summary(fm)$coefficients[,3]
}
get_pvals <- function(fm) {
summary(fm)$coefficients[,4]
}
ts <- lapply(models, get_ts)
pvals <- lapply(models, get_pvals)
stargazer(models, type = "text", report=('vc*s'), se = ts, p = pvals)
#>
#> =================================================================
#> Dependent variable:
#> ---------------------------------------------
#> mpg
#> (1) (2)
#> -----------------------------------------------------------------
#> cyl -1.587** -1.785***
#> (-2.230) (-2.940)
#>
#> disp -0.021* 0.007
#> (-2.007) (0.631)
#>
#> wt -3.636***
#> (-3.495)
#>
#> Constant 34.661*** 41.108***
#> (13.609) (14.462)
#>
#> -----------------------------------------------------------------
#> Observations 32 32
#> R2 0.760 0.833
#> Adjusted R2 0.743 0.815
#> Residual Std. Error 3.055 (df = 29) 2.595 (df = 28)
#> F Statistic 45.808*** (df = 2; 29) 46.424*** (df = 3; 28)
#> =================================================================
#> Note: *p<0.1; **p<0.05; ***p<0.01
Created on 2021-05-11 by the reprex package (v2.0.0)
Can you give a minimal reproduciable example? I think there is no "t = " as default.
library(stargazer)
model <- lm(mpg ~ cyl + disp, data=mtcars)
stargazer(model, type="text")
===============================================
Dependent variable:
---------------------------
mpg
-----------------------------------------------
cyl -1.587**
(0.712)
disp -0.021*
(0.010)
Constant 34.661***
(2.547)
-----------------------------------------------
Observations 32
R2 0.760
Adjusted R2 0.743
Residual Std. Error 3.055 (df = 29)
F Statistic 45.808*** (df = 2; 29)
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01
I would like to make changes to my path diagram that I made with the lavaan and semPlot packages.
require(lavaan); require(semPlot)
head(mtcars)
model <-'
mpg ~ hp + gear + cyl
hp ~ cyl + disp
'
fit <- sem(model, "std", data = mtcars)
semPaths(fit, "std", fade = F, residuals = F)
Because mpg <- gear and mpg <- cyl are not significant, I would like to have it displayed in a transparent way (e.g., adding * to the significant pathlines or preventing from non-significant pathlines from showing up on a path diagram). Is there any way to do that?
Thank you for your support!
I know it's an old thread but I found it while looking for this, and figured I should provide my solution for others.
require(lavaan); require(semPlot) ; require(tidyverse)
#> Loading required package: lavaan
#> This is lavaan 0.6-3
#> lavaan is BETA software! Please report any bugs.
#> Loading required package: semPlot
#> Registered S3 methods overwritten by 'huge':
#> method from
#> plot.sim BDgraph
#> print.sim BDgraph
#> Loading required package: tidyverse
model <-'
mpg ~ hp + gear + cyl
hp ~ cyl + disp
'
fit <- sem(model, "std", data = mtcars)
# got this warning, but simply ignored it.
#> Warning in lav_partable_check(lavpartable, categorical =
#> lavoptions$categorical, : lavaan WARNING: parameter table does not contain
#> thresholds
lavaan::standardizedSolution(fit) %>% dplyr::filter(!is.na(pvalue)) %>% arrange(desc(pvalue)) %>% mutate_if("is.numeric","round",3) %>% select(-ci.lower,-ci.upper,-z)
#> lhs op rhs est.std se pvalue
#> 1 mpg ~ gear 0.022 0.087 0.801
#> 2 mpg ~ cyl -0.166 0.260 0.524
#> 3 mpg ~ hp -0.694 0.242 0.004
#> 4 hp ~~ hp 0.101 0.034 0.003
#> 5 hp ~1 -2.674 0.600 0.000
#> 6 hp ~ disp 0.444 0.094 0.000
#> 7 hp ~ cyl 0.529 0.098 0.000
#> 8 mpg ~1 4.514 0.751 0.000
#> 9 mpg ~~ mpg 0.258 0.039 0.000
pvalue_cutoff <- 0.05
obj <- semPlot:::semPlotModel(fit)
# save a copy of the original, so we can compare it later and be sure we removed only what we intended to remove
original_Pars <- obj#Pars
check_Pars <- obj#Pars %>% dplyr::filter(!(edge %in% c("int","<->") | lhs == rhs)) # this is the list of paramater to sift thru
keep_Pars <- obj#Pars %>% dplyr::filter(edge %in% c("int","<->") | lhs == rhs) # this is the list of paramater to keep asis
test_against <- lavaan::standardizedSolution(fit) %>% dplyr::filter(pvalue < pvalue_cutoff, rhs != lhs)
test_against_rev <- test_against %>% rename(rhs2 = lhs, # for some reason, the rhs and lhs are reversed in the standardizedSolution() output, for some of the values
lhs = rhs) %>% # I'll have to reverse it myself, and test against both orders
rename(rhs = rhs2)
checked_Pars <-
check_Pars %>% semi_join(test_against, by = c("lhs", "rhs")) %>% bind_rows(
check_Pars %>% semi_join(test_against_rev, by = c("lhs", "rhs"))
)
obj#Pars <- keep_Pars %>% bind_rows(checked_Pars)
#let's verify by looking at the list of the edges we removed from the object
anti_join(original_Pars,obj#Pars)
#> Joining, by = c("label", "lhs", "edge", "rhs", "est", "std", "group", "fixed", "par")
#> label lhs edge rhs est std group fixed par
#> 1 gear ~> mpg 0.1582792 0.0218978 FALSE 2
#> 2 cyl ~> mpg -0.4956938 -0.1660012 FALSE 3
# great, let's plot
semPlot::semPaths(obj, "std",fade = F, residuals = F)
Note this is highly tinkered, and the criterion for exclusion should be modified to your needs (especially the (edge %in% c("int","<->") parts)
Created on 2019-07-09 by the reprex package (v0.3.0)
redacted session_info()
#> lavaan * 0.6-3 2018-09-22 [1] CRAN (R 3.6.0)
#> semPlot * 1.1.1 2019-04-05 [1] CRAN (R 3.6.0)
#> tidyverse * 1.2.1 2017-11-14 [1] CRAN (R 3.6.0)
I have recently discovered the lavaanPlot package, which allows to show the coefficients for a specified significance criteria. The code is:
require(lavaan); require(lavaanPlot)
head(mtcars)
model <-'
mpg ~ hp + gear + cyl
hp ~ cyl + disp
'
fit <- sem(model, "std", data = mtcars)
sem.model <- lavaanPlot(model = fit, node_options = list(shape = "box", fontname = "Helvetica"), edge_options = list(color = "grey"), coefs = TRUE, sig = 0.05)
The resulting image looks like this:
I believe it can be further customized.