I want to save my output regression of lmer() from lme4 R package. Is there any good way for this to get the output below in a table e.g .csv or .txt or .html etc?
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 103.989 5.617 139.000 18.52 < 2e‐16 ***
age ‐0.172 0.177 139.000 ‐1.03 0.304
bmi 0.597 0.229 139.000 2.56 0.012 *
gender 1.019 0.325 139.000 3.15 0.002 **
I tried, tab_model() from library sjplot in R, but it does not give the SE, df and t values. I would like to save the output above. I appreciate any advice.
Make sure the class of your model object is lmerMod and it will work with stargazer, which exports beautiful formatted regression tables to plain text, html, latex, etc. and has all sort of options to customize those tables (see the docs).
# class(mod)<- "lmerMod"
mod <- lme4::lmer(Ozone ~ Temp + (1|Month),
data = airquality)
stargazer::stargazer(mod)
stargazer::stargazer(mod, type = "html")
Update:to write to textfile:
library(lme4)
m1 <- lmer(drat ~ wt + (1 + wt|cyl), data=mtcars)
library(broom.mixed)
library(dplyr)
df<- m1 %>%
tidy()
write.table(df,"filename.txt",sep="\t",row.names=FALSE)
OR
m1 %>%
tidy() %>%
write.table(.,"filename.txt",sep="\t",row.names=FALSE)
"effect" "group" "term" "estimate" "std.error" "statistic"
"fixed" NA "(Intercept)" 4.67281034450577 0.344833957358875 13.5508996280279
"fixed" NA "wt" -0.344238767944164 0.0911701519816392 -3.77578363600283
"ran_pars" "cyl" "sd__(Intercept)" 0.374914148920673 NA NA
"ran_pars" "cyl" "cor__(Intercept).wt" -1 NA NA
"ran_pars" "cyl" "sd__wt" 0.0839046849277359 NA NA
"ran_pars" "Residual" "sd__Observation" 0.370192153038516 NA NA
One way could be using broom.mixed package as suggested by #
user63230 in the comments section:
Here is an example:
library(lme4)
m1 <- lmer(drat ~ wt + (1 + wt|cyl), data=mtcars)
library(broom.mixed)
library(dplyr)
m1 %>%
tidy()
effect group term estimate std.error statistic
<chr> <chr> <chr> <dbl> <dbl> <dbl>
1 fixed NA (Intercept) 4.67 0.345 13.6
2 fixed NA wt -0.344 0.0912 -3.78
3 ran_pars cyl sd__(Intercept) 0.375 NA NA
4 ran_pars cyl cor__(Intercept).wt -1 NA NA
5 ran_pars cyl sd__wt 0.0839 NA NA
6 ran_pars Residual sd__Observation 0.370 NA NA
Related
Want to preface this with heaps of appreciate for gtsummary -- wonderful package.
After using tidymodels, GLM, and gtsummary for a while, I've been trying to understand gtsummary's computations for GLM model performance and confidence intervals.
Can the anyone and/or Dr. Sjoberg + gtsummary team explain the following questions 1 & 2
Question 1: Why are standard errors different when using broom::tidy() vs. parameters::model_parameters() functions to extract model residual data?
(Bolded text in print outs shows differences)
library(gtsummary)
library(parameters)
library(rsample)
library(broom)
trial2 <- trial %>% select(age, grade, response, trt) %>%
drop_na()
model_trial2 <- glm(response ~ age + grade + trt,
data = trial2,
family=binomial(link="logit"))
broom::tidy(model_trial2, exponentiate = TRUE)
# # A tibble: 5 × 5
# term estimate std.error statistic p.value
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 (Intercept) 0.184 **0.630** -2.69 0.00715
# 2 age 1.02 0.0114 1.67 0.0952
# 3 gradeII 0.852 **0.395** -0.406 0.685
# 4 gradeIII 1.01 0.385 0.0199 0.984
# 5 trtDrug B 1.13 **0.321** 0.387 0.699
preadmission_model_parameters <- model_trial2 %>% parameters::model_parameters(exponentiate = TRUE)
preadmission_model_parameters
# Parameter | Odds Ratio | SE | 95% CI | z | p
# ---------------------------------------------------------------
# (Intercept) | 0.18 | **0.12** | [0.05, 0.61] | -2.69 | 0.007
# age | 1.02 | 0.01 | [1.00, 1.04] | 1.67 | 0.095
# grade [II] | 0.85 | **0.34** | [0.39, 1.85] | -0.41 | 0.685
# grade [III] | 1.01 | 0.39 | [0.47, 2.15] | 0.02 | 0.984
# trt [Drug B] | 1.13 | **0.36** | [0.60, 2.13] | 0.39 | 0.699
Question 2: (a) What method does gtsummary use to produce confidence intervals? (b) can the user define (stratified or unstratified) k-fold cross-validation or bootstraps to produce confidence intervals?
(Bolded differences in confidence intervals for the reg_intervals() bootstrapped confidence intervals and the unknown method gtsummary tbl_regression() confidence intervals.)
library(gtsummary)
library(parameters)
library(rsample)
library(broom)
trial2 <- trial %>% select(age, grade, response, trt) %>%
drop_na()
bootstraps(trial2, times = 10)
trial_bootrapped_confidence_intervals <- reg_intervals(response ~ age + grade + trt,
data = trial2,
model_fn = "glm",
keep_reps = TRUE,
family=binomial(link="logit"))
trial_bootrapped_confidence_intervals_exp <- trial_bootrapped_confidence_intervals %>%
select(term:.alpha) %>%
mutate(across(.cols = c(.lower, .estimate, .upper), ~exp(.))) %>%
as_tibble()
trial_bootrapped_confidence_intervals_exp
# # A tibble: 4 × 5
# term .lower .estimate .upper .alpha
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 age 0.997 1.02 1.04 0.05
# 2 gradeII **0.400** 0.846 **1.86** 0.05
# 3 gradeIII 0.473 1.01 2.10 0.05
# 4 trtDrug B 0.600 1.14 2.22 0.05
model_trial2_tbl_regression <-
glm(response ~ age + grade + trt,
data = trial2,
family=binomial(link="logit")) %>%
tbl_regression(
exponentiate = T
) %>%
add_global_p()
model_trial2_tbl_regression_metrics <- model_trial2_tbl_regression$table_body %>%
select(
label,
estimate,
std.error,
statistic,
conf.low ,
conf.high,
p.value
)
model_trial2_tbl_regression_metrics
# A tibble: 8 × 7
# label estimate std.error statistic conf.low conf.high p.value
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 Age 1.02 0.0114 1.67 0.997 1.04 0.0909
# 2 Grade NA NA NA NA NA 0.894
# 3 I NA NA NA NA NA NA
# 4 II 0.852 0.395 -0.406 **0.389** **1.85** NA
# 5 III 1.01 0.385 0.0199 0.472 2.15 NA
# 6 Chemotherapy Treatment NA NA NA NA NA 0.699
# 7 Drug A NA NA NA NA NA NA
# 8 Drug B 1.13 0.321 0.387 0.603 2.13 NA
The issue is with the exponentiation (applied as the family is binomial). Broom::tidy does not exponentiate the standard errors but parameters does. You can see this with broom::tidy(model_trial2, exponentiate = TRUE) and broom::tidy(model_trial2, exponentiate = FALSE), which return the same standard errors. parameters::model_parameters(exponentiate = TRUE) and parameters::model_parameters(exponentiate = FALSE) return different standard errors. When exponentiate is FALSE for parameters, the standard errors match. This is discussed in Check exponentiate behavior in tidy methods #422
To create a custom tidier for gtsummary, see FAQ + Gallery
I have an object (S3; lm) that contains the linear regression outputs of 471 different models. I am trying to extract the standard error of a specific variable in each model but I'm unsure how to do so, can anyone help? Specifically, I want to extract the standard error for the variable "p" for EACH of the 471 models saved in the "fit" object.
varnames = names(merged1)[2036:2507]
fit <- lapply(varnames,
FUN=function(p) lm(formula(paste("Dx ~ x + y + z + q +", p)),data=merged1))
names(fit) <- varnames
Thank you so much!
Note
Edited to reflect the anonymous function p, rather than x, as stated previously.
Using fit shown reproducibly in the Note at the end invoke map_dfr on that with tidy which will give a data frame containing coefficients and associated statistics. We filter out the rows we want.
library(broom) # tidy
library(dplyr)
library(purrr) # map_dfr
fit %>%
map_dfr(tidy, .id = "variable") %>%
filter(term == variable)
giving:
# A tibble: 8 x 6
variable term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 hp hp -0.0147 0.0147 -1.00 0.325
2 drat drat 1.21 1.50 0.812 0.424
3 wt wt -3.64 1.04 -3.50 0.00160
4 qsec qsec -0.243 0.402 -0.604 0.551
5 vs vs -0.634 1.90 -0.334 0.741
6 am am 1.93 1.34 1.44 0.161
7 gear gear 0.158 0.910 0.174 0.863
8 carb carb -0.737 0.393 -1.88 0.0711
Note
We compute fit reproducibly using mtcars which is built into R.
data <- mtcars
resp <- "mpg" # response
fixed <- c("cyl", "disp") # always include these
varnames <- setdiff(names(data), c(resp, fixed)) # incl one at a time
fit <- Map(function(v) {
fo <- reformulate(c(fixed, v), resp)
lm(fo, data)
}, varnames)
Updated
Significantly revised.
sapply(fit,function(x) summary(x)$coefficients[p,][2],simplify = F)
subsetting to 2nd element serves standard error for a variable.
This question already has answers here:
Linear Regression and group by in R
(10 answers)
Closed 2 years ago.
My dataset looks like this
df = data.frame(site=c(rep('A',95),rep('B',110),rep('C',250)),
nps_score=c(floor(runif(455, min=0, max=10))),
service_score=c(floor(runif(455, min=0, max=10))),
food_score=c(floor(runif(455, min=0, max=10))),
clean_score=c(floor(runif(455, min=0, max=10))))
I'd like to run a linear model on each group (i.e. for each site), and produce the coefficients for each group in a dataframe, along with the significance levels of each variable.
I am trying to group_by the site variable and then run the model for each site but it doesn't seem to be working. I've looked at some existing solutions on stack overflow but cannot seem to adapt the code to my solution.
#Trying to run this by group, and output the resulting coefficients per site in a separate df with their signficance levels.
library(MASS)
summary(ols <- rlm(nps_score ~ ., data = df))
Any help on this would be greatly appreciated
library(tidyverse)
library(broom)
library(MASS)
# We first create a formula object
my_formula <- as.formula(paste("nps_score ~ ", paste(df %>% select(-site, -nps_score) %>% names(), collapse= "+")))
# Now we can group by site and use the formula object within the pipe.
results <- df %>%
group_by(site) %>%
do(tidy(rlm(formula(my_formula), data = .)))
which gives:
# A tibble: 12 x 5
# Groups: site [3]
site term estimate std.error statistic
<chr> <chr> <dbl> <dbl> <dbl>
1 A (Intercept) 5.16 0.961 5.37
2 A service_score -0.0656 0.110 -0.596
3 A food_score -0.0213 0.102 -0.209
4 A clean_score -0.0588 0.110 -0.536
5 B (Intercept) 2.22 0.852 2.60
6 B service_score 0.221 0.103 2.14
7 B food_score 0.163 0.104 1.56
8 B clean_score -0.0383 0.0928 -0.413
9 C (Intercept) 5.47 0.609 8.97
10 C service_score -0.0367 0.0721 -0.509
11 C food_score -0.0585 0.0724 -0.808
12 C clean_score -0.0922 0.0691 -1.33
Note: i'm not familiar with the rlm function and if it provides p-values in the first place. But at least the tidy function doesn't offer p-values for rlm. If a simple linear regression would fit your suits, you could replace the rlm function by lm in which case a sixth column with p-values would be added.
Does exist any package which can help me to export results of multinomial logit to excel for example like a table?
The broom package does a reasonable job of tidying multinomial output.
library(broom)
library(nnet)
fit.gear <- multinom(gear ~ mpg + factor(am), data = mtcars)
summary(fit.gear)
Call:
multinom(formula = gear ~ mpg + factor(am), data = mtcars)
Coefficients:
(Intercept) mpg factor(am)1
4 -11.15154 0.5249369 11.90045
5 -18.39374 0.3662580 22.44211
Std. Errors:
(Intercept) mpg factor(am)1
4 5.317047 0.2680456 66.895845
5 67.931319 0.2924021 2.169944
Residual Deviance: 28.03075
AIC: 40.03075
tidy(fit.gear)
# A tibble: 6 x 6
y.level term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 4 (Intercept) 1.44e-5 5.32 -2.10 3.60e- 2
2 4 mpg 1.69e+0 0.268 1.96 5.02e- 2
3 4 factor(am)1 1.47e+5 66.9 0.178 8.59e- 1
4 5 (Intercept) 1.03e-8 67.9 -0.271 7.87e- 1
5 5 mpg 1.44e+0 0.292 1.25 2.10e- 1
6 5 factor(am)1 5.58e+9 2.17 10.3 4.54e-25
Then use the openxlsx package to send that to Excel.
library(openxlsx)
write.xlsx(file="E:/.../fitgear.xlsx", tidy(fit.gear))
(Note that the tidy function exponentiates the coefficients by default, although the help page incorrectly says the default is FALSE). So these are relative risk ratios, which is why they don't match the output of summary. And if you want confidence intervals, you have to ask for them.)
I'd like to add a reference level to the final output of linear regression output lm().
For example:
levels(iris$Species)
"setosa" "versicolor" "virginica"
summary(lm(Sepal.Length ~ Petal.Width + Species, iris))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.78044 0.08308 57.543 < 2e-16 ***
Petal.Width 0.91690 0.19386 4.730 5.25e-06 ***
Speciesversicolor -0.06025 0.23041 -0.262 0.794
Speciesvirginica -0.05009 0.35823 -0.140 0.889
I'd like to have it like:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.78044 0.08308 57.543 < 2e-16 ***
Petal.Width 0.91690 0.19386 4.730 5.25e-06 ***
Speciessetosa
Speciesversicolor -0.06025 0.23041 -0.262 0.794
Speciesvirginica -0.05009 0.35823 -0.140 0.889
I've been looking for it a lot for a while but no clues yet. Any help would be highly appreciated.
#EDIT
Data for further expansion:
iris$Petal.Width <- as.factor(ifelse(iris$Petal.Width >1, "Big", "Small"))
levels(iris$Petal.Width)
"Big" "Small"
Here is a basic workflow you can work off of, is uses dplyr and broom to join your levels with your coefficients table. Right now it requires you know which variables are factors. You could change the NA to "" if you prefer. It also organizes the output alphabetically which will not always put the reference group first. Let me know if you have any issues with scaling of this:
library(broom)
library(dplyr)
iris <- datasets::iris
iris$Petal.Width <- factor(ifelse(iris$Petal.Width > 1, "Big", "Small"), levels = c("Small", "Big"))
reg_obj <- lm(Sepal.Length ~ Petal.Width + Species, iris)
factor_levels <- tibble(term = c(paste0("Species", levels(iris$Species)),
paste0("Petal.Width", levels(iris$Petal.Width))))
full_join(tidy(reg_obj), factor_levels, by = "term") %>%
arrange(term)
# A tibble: 6 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 5.01 0.0709 70.6 1.03e-114
2 Petal.WidthBig 0.607 0.204 2.97 3.51e- 3
3 Petal.WidthSmall NA NA NA NA
4 Speciessetosa NA NA NA NA
5 Speciesversicolor 0.408 0.202 2.02 4.55e- 2
6 Speciesvirginica 0.975 0.228 4.28 3.33e- 5
This produces the desired output:
res <- capture.output(summary(lm(Sepal.Length ~ Petal.Width + Species, data = iris)))
res[14:22] <- res[13:21]
res[13] <- "Speciessetosa"
cat(res, sep = "\n")