I´ve ben analyzing data for a paper and have now obtained results in mutiple linear regression. However the summaries R provides are not really fir for publication in the final paper. Also I have specified one variable in several different ways, to showcase the robustness of the results.
Hw can I create a nice, exportable table in R, that contains Variable Names (ideally also allows to name the variables in a more informative way), estimates, standard errors, robust standard errors p values and ideally also the significance indicators? For illustration:
I have summary outputs like this:
Residuals:
Min 1Q Median 3Q Max
-50.868 -4.644 1.583 7.054 20.490
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.710e+01 1.848e+01 2.549 0.0136 *
Var1 -8.588e-01 2.201e+00 -0.390 0.6979
Var2 2.486e+00 1.055e+00 2.357 0.0220 *
log(specification1) 3.376e+00 2.152e+00 1.569 0.1223
Var4 -3.651e-04 2.797e-04 -1.305 0.1971
Var5 4.809e+00 2.654e+00 1.812 0.0753 .
Var6 -8.706e+00 6.972e+00 -1.249 0.2170
Var7 -8.172e+00 5.755e+00 -1.420 0.1612
Var8 -3.276e+00 7.067e+00 -0.463 0.6448
Var9 -1.477e+01 7.849e+00 -1.882 0.0650 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
and
Residuals:
Min 1Q Median 3Q Max
-48.881 -5.699 0.956 8.947 17.888
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.258e+01 1.750e+01 2.405 0.0195 *
Var1 4.298e-01 2.120e+00 0.200 0.8421
Var2 5.179e+00 1.027e+00 2.122 0.0271 *
log(specification 2) 2.050e+00 9.435e-01 2.173 0.0338 *
Var4 -1.420e-04 2.261e-04 -1.513 0.1356
Var5 4.584e+00 2.511e+00 1.826 0.0730 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
and I would like to get to a table looking something like this:
Model1 Model2
Intercept Estimate Std.Error p robust_Std.Error robust_p Estimate Std.Error p robust ...
Var1
Var2
Var3
Var4
Var5
Var6
Var7
Var8
Var9
which in the columns of course contains the values of the estimates. Is there a function/ package that does that nicely?
Thanks in advance
I suggest you to use the broom package, like this:
fit1 <- lm(mpg ~ ., mtcars)
broom::tidy(fit1)
# # A tibble: 11 x 5
# term estimate std.error statistic p.value
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 (Intercept) 12.3 18.7 0.657 0.518
# 2 cyl -0.111 1.05 -0.107 0.916
# 3 disp 0.0133 0.0179 0.747 0.463
# 4 hp -0.0215 0.0218 -0.987 0.335
# 5 drat 0.787 1.64 0.481 0.635
# 6 wt -3.72 1.89 -1.96 0.0633
# 7 qsec 0.821 0.731 1.12 0.274
# 8 vs 0.318 2.10 0.151 0.881
# 9 am 2.52 2.06 1.23 0.234
# 10 gear 0.655 1.49 0.439 0.665
# 11 carb -0.199 0.829 -0.241 0.812
It will extract a tibble from the output of the lm function.
If you have more than one model and you wanna set all the tibbles together with common terms you can deal with it this way:
Create a list x of your models.
fit1 <- lm(mpg ~ cyl + disp + gear, mtcars)
fit2 <- lm(mpg ~ cyl + hp + drat, mtcars)
x <- list(fit1, fit2)
You can use this solution:
library(purrr)
library(dplyr)
library(stringr)
# set names for the list
names(x) <- paste("Model", seq_along(x), sep = "_")
# tidy them up
x <- map(x, broom::tidy)
# set the list names at the beginning of each column
x <- imap(x, ~set_names(.x, paste(.y, names(.x), sep = "_")))
# rename each term column as "term"
x <- map(x, ~rename_with(.x, str_replace, pattern = ".*term", replacement = "term"))
# join them all together
reduce(x, full_join, by = "term")
It returns the output you asked for:
# A tibble: 6 x 9
term Model1_estimate Model1_std.error Model1_statistic Model1_p.value Model2_estimate Model2_std.error Model2_statistic Model2_p.value
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 34.0 4.76 7.13 0.0000000925 22.5 7.99 2.82 0.00880
2 cyl -1.59 0.724 -2.20 0.0366 -1.36 0.735 -1.85 0.0747
3 disp -0.0200 0.0109 -1.83 0.0774 NA NA NA NA
4 gear 0.158 0.910 0.174 0.863 NA NA NA NA
5 hp NA NA NA NA -0.0288 0.0153 -1.88 0.0704
6 drat NA NA NA NA 2.84 1.52 1.87 0.0725
If your list has more than two models, the code will be stable.
Related
I have a number of linear mixed models, which I have fitted with the lmerTest library, so that the summary() of the function would provide me with p-values of fixed effects.
I have written a loop function that extract the fixed effects of gender:time and gender:time:explanatory variable of interest.
Trying to now also extract the p-value of gender:time fixed effect (step 1) and also gender:time:explanatory variable (step 2).
Normally I can extract the p-value with this code:
coef(summary(model))[,5]["genderfemale:time"]
But inside the loop function it doesn't work and gives the error: "Error in coef(summary(model))[, 5] : subscript out of bounds"
See code
library(lmerTest)
# Create a list of models with interaction terms to loop over
models <- list(
mixed_age_interaction,
mixed_tnfi_year_interaction,
mixed_crp_interaction
)
# Create a list of explanatory variables to loop over
explanatoryVariables <- list(
"age_at_diagnosis",
"bio_drug_start_year",
"crp"
)
loop_function <- function(models, explanatoryVariables) {
# Create an empty data frame to store the results
coef_df <- data.frame(adj_coef_gender_sex = numeric(), coef_interaction_term = numeric(), explanatory_variable = character(), adj_coef_pvalue = numeric())
# Loop over the models and explanatory variables
for (i in seq_along(models)) {
model <- models[[i]]
explanatoryVariable <- explanatoryVariables[[i]]
# Extract the adjusted coefficients for the gender*time interaction
adj_coef <- fixef(model)["genderfemale:time"]
# Extract the fixed effect of the interaction term
interaction_coef <- fixef(model)[paste0("genderfemale:time:", explanatoryVariable)]
# Extract the p-value for the adjusted coefficient for gender*time
adj_coef_pvalue <- coef(summary(model))[,5]["genderfemale:time"]
# Add a row to the data frame with the results for this model
coef_df <- bind_rows(coef_df, data.frame(adj_coef_gender_sex = adj_coef, coef_interaction_term = interaction_coef, explanatory_variable = explanatoryVariable, adj_coef_pvalue = adj_coef_pvalue))
}
return(coef_df)
}
# Loop over the models and extract the fixed effects
coef_df <- loop_function(models, explanatoryVariables)
coef_df
My question is how can I extract the p-values from the models for gender:time and gender:time:explanatory variable and add them to the final data.frame coef_df?
Also adding a summary of one of the models for reference
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: basdai ~ 1 + gender + time + age_at_diagnosis + gender * time +
time * age_at_diagnosis + gender * age_at_diagnosis + gender *
time * age_at_diagnosis + (1 | ID) + (1 | country)
Data: dat
AIC BIC logLik deviance df.resid
254340.9 254431.8 -127159.5 254318.9 28557
Scaled residuals:
Min 1Q Median 3Q Max
-3.3170 -0.6463 -0.0233 0.6092 4.3180
Random effects:
Groups Name Variance Std.Dev.
ID (Intercept) 154.62 12.434
country (Intercept) 32.44 5.695
Residual 316.74 17.797
Number of obs: 28568, groups: ID, 11207; country, 13
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 4.669e+01 1.792e+00 2.082e+01 26.048 < 2e-16 ***
genderfemale 2.368e+00 1.308e+00 1.999e+04 1.810 0.0703 .
time -1.451e+01 4.220e-01 2.164e+04 -34.382 < 2e-16 ***
age_at_diagnosis 9.907e-02 2.220e-02 1.963e+04 4.463 8.12e-06 ***
genderfemale:time 1.431e-01 7.391e-01 2.262e+04 0.194 0.8464
time:age_at_diagnosis 8.188e-02 1.172e-02 2.185e+04 6.986 2.90e-12 ***
genderfemale:age_at_diagnosis 8.547e-02 3.453e-02 2.006e+04 2.476 0.0133 *
genderfemale:time:age_at_diagnosis 4.852e-03 1.967e-02 2.274e+04 0.247 0.8052
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) gndrfm time ag_t_d gndrf: tm:g__ gnd:__
genderfemal -0.280
time -0.241 0.331
age_t_dgnss -0.434 0.587 0.511
gendrfml:tm 0.139 -0.519 -0.570 -0.293
tm:g_t_dgns 0.228 -0.313 -0.951 -0.533 0.543
gndrfml:g__ 0.276 -0.953 -0.329 -0.639 0.495 0.343
gndrfml::__ -0.137 0.491 0.567 0.319 -0.954 -0.596 -0.516
The internal function get_coefmat of {lmerTest} might be handy:
if fm is an example {lmer} model ...
library("lmerTest")
fm <- lmer(Informed.liking ~ Gender + Information * Product +
(1 | Consumer) + (1 | Consumer:Product),
data=ham
)
... you can obtain the coefficients including p-values as a dataframe like so (note the triple colon to expose the internal function):
df_coeff <- lmerTest:::get_coefmat(fm) |>
as.data.frame()
output:
## > df_coeff
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 5.8490289 0.2842897 322.3364 20.5741844 1.173089e-60
## Gender2 -0.2442835 0.2605644 79.0000 -0.9375169 3.513501e-01
## Information2 0.1604938 0.2029095 320.0000 0.7909626 4.295517e-01
## Product2 -0.8271605 0.3453291 339.5123 -2.3952818 1.714885e-02
## Product3 0.1481481 0.3453291 339.5123 0.4290057 6.681912e-01
## ...
edit
Here's a snippet which will return you the extracted coefficents for, e.g., models m1 and m2 as a combined dataframe:
library(dplyr)
library(tidyr)
library(purrr)
library(tibble)
list('m1', 'm2') |> ## observe the quotes
map_dfr( ~ list(
model = .x,
coeff = lmerTest:::get_coefmat(get(.x)) |>
as.data.frame() |>
rownames_to_column()
)
) |>
as_tibble() |>
unnest_wider(coeff)
output:
## + # A tibble: 18 x 7
## model rowname Estimate `Std. Error` df `t value` `Pr(>|t|)`
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 m1 (Intercept) 5.85 0.284 322. 20.6 1.17e-60
## 2 m1 Gender2 -0.244 0.261 79.0 -0.938 3.51e- 1
## ...
## 4 m1 Product2 -0.827 0.345 340. -2.40 1.71e- 2
## ...
## 8 m1 Information2:Product3 0.272 0.287 320. 0.946 3.45e- 1
## ...
## 10 m2 (Intercept) 5.85 0.284 322. 20.6 1.17e-60
## 11 m2 Gender2 -0.244 0.261 79.0 -0.938 3.51e- 1
## ...
I am running a multivariate regression on ~150 different outcomes. Because gathering the results by individual tables or by hand is tidious, obviously, I have tried to produce a datafram out of the results. So far my steps:
I made a function for the regression:
f1 <- function(X){summary(lm(X~HFres + age + sex + season + nonalcTE, data=dslin))}
I applied apply() to make a list (I only used a few of the 150 outcomes while trying to make it work)
m1 <- apply(dslin[,c(21:49)], MARGIN=2, FUN=f1)
Then I change the object into a dataframe:
m2 <- m1 %>% {tibble(variables = names(.),coefficient = map(., "coefficients"))} %>% unnest_wider(coefficient)
This is the result:
> m2
>A tibble: 29 x 9
> variables `(Intercept)`[,1] [,2] [,3] [,4] HFres[,1] [,2] [,3] [,4]
> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
> 1 C_101_IL8 3.59 0.106 34.0 1.28e-224 0.0000129 0.00367 0.00352 0.997
> 2 C_102_VEGFA 9.28 0.0844 110. 0 0.00425 0.00293 1.45 0.147
> 3 C_103_AM 4.92 0.0820 60.0 0 0.00261 0.00285 0.916 0.360
> 4 C_105_CD40L 7.53 0.164 45.9 0 0.00549 0.00570 0.964 0.335
> 5 C_106_GDF15 6.97 0.0864 80.7 0 0.00196 0.00300 0.653 0.514
> 6 C_107_PlGF 6.25 0.0665 94.0 0 0.00219 0.00231 0.947 0.344
> 7 C_108_SELE 4.89 0.117 41.8 1.14e-321 0.000978 0.00406 0.241 0.810
> 8 C_109_EGF 6.59 0.157 41.9 1.8 e-322 0.00714 0.00546 1.31 0.191
> 9 C_110_OPG 8.21 0.0673 122. 0 0.000320 0.00234 0.137 0.891
>10 C_111_SRC 7.62 0.0511 149. 0 0.000660 0.00177 0.372 0.710
>... with 19 more rows, and 6 more variables: age <dbl[,4]>, sexFemale <dbl[,4]>,
> seasonfall <dbl[,4]>, seasonspring <dbl[,4]>, seasonsummer <dbl[,4]>,
> nonalcTE <dbl[,4]>
It's a bit bad to see here but initially in m1 I had two columns, one with the variables and one with a list. Then after unnesting I have several columns which still each have 4 columns.
When I export this to excell (with the rio package) only the [,1] columns show up because the columns '(Intercept)', HF res, ecc. are still nested.
I have tried applying the unnest_wider() command again
m2 %>% unnest_wider(list=c('(Intercept)', 'HFres', 'age', 'sexFemale', 'seasonfall', 'seasonspring', 'seasonsummer')
This didn't work, because it didn't accept that I want to unnest a list of columns instead of a dataframe.
I then tried it for only one of the variables to start with
m2 %>% unnest_wider(HFres)
This also gave me errors.
So, my remaining problem is I still need to unnest the columns of m2 in order to make them all visible when I export them.
Alternatively, It would be enough for me to have only the [,1] and [,4] subcolumn of each column if that is easier to extract them. I know I can e.g. access one subcolumn like this: m2[["age"]][,1] and maybe I could make a new dataframe from m2 extracting all the columns I want?
Thank you for your help!
Update: reprex ( I hope this is a correct understanding of what a reprex is)
create dataframe
age <- c(34, 56, 24, 78, 56, 67, 45, 93, 62, 16)
bmi <- c(24, 25, 27, 23, 2, 27, 28, 24, 27, 21)
educ <- c(4,2,5,1,3,2,4,5,2,3)
smoking <- c(1,3,2,2,3,2,1,3,2,1)
HF <- c(3,4,2,4,5,3,2,3,5,2)
P1 <- c(5,4,7,9,5,6,7,3,4,2)
P2 <- c(7,2,4,6,5,3,2,5,6,3)
P3 <- c(6,4,2,3,5,7,3,2,5,6)
df <- data.frame(age, bmi, educ, smoking, HF, P1, P2, P3)
function
f1 <- function(X){summary(lm(X~HF + age + bmi + educ + smoking, data=df))}
apply function to columns
m1 <- apply(df[,c(6:8)], MARGIN=2, FUN=f1)
m2 <- m1 %>% {tibble(variables = names(.),coefficient = map(., "coefficients"))} %>% unnest_wider(coefficient)
I basically need the coefficient (beta) which is the [,1] of each column and the p-value which is the [,4]
The broom package is intended for exactly this — turning model results into tidy dataframes. Here’s an example using broom::tidy() to get a table of coefficients for each dv, and purrr::map_dfr() to iterate over dvs, row-bind the coefficient tables, and add a column with the dv for each model:
library(broom)
library(purrr)
f1 <- function(X) {
tidy(lm(
as.formula(paste(X, "~ mpg * cyl")),
data = mtcars
))
}
model_results <- map_dfr(
set_names(names(mtcars)[3:11]),
f1,
.id = "dv"
)
model_results
Output:
# A tibble: 36 x 6
dv term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 disp (Intercept) -242. 154. -1.57 0.128
2 disp mpg 10.3 6.47 1.59 0.123
3 disp cyl 103. 22.9 4.52 0.000104
4 disp mpg:cyl -3.24 1.18 -2.75 0.0104
5 hp (Intercept) -86.5 123. -0.704 0.487
6 hp mpg 4.59 5.16 0.889 0.381
7 hp cyl 50.3 18.2 2.75 0.0102
8 hp mpg:cyl -1.47 0.940 -1.57 0.128
9 drat (Intercept) 3.34 1.28 2.61 0.0145
10 drat mpg 0.0541 0.0538 1.01 0.323
# ... with 26 more rows
If you want dvs in rows and coefficients in columns, you can tidyr::pivot_wider():
library(tidyr)
model_coefs <- pivot_wider(
model_results,
id_cols = dv,
names_from = term,
values_from = estimate
)
model_coefs
Output:
# A tibble: 9 x 5
dv `(Intercept)` mpg cyl `mpg:cyl`
<chr> <dbl> <dbl> <dbl> <dbl>
1 disp -242. 10.3 103. -3.24
2 hp -86.5 4.59 50.3 -1.47
3 drat 3.34 0.0541 -0.0354 -0.00533
4 wt 2.98 -0.00947 0.478 -0.0219
5 qsec 25.0 -0.0938 -0.862 0.000318
6 vs 2.38 -0.0194 -0.292 0.00223
7 am -0.908 0.0702 0.0721 -0.00470
8 gear 4.22 0.0115 -0.181 0.00311
9 carb 3.32 -0.0830 0.249 -0.00333
I want to calculate CI in mixed models, zero inflated negative binomial and hurdle model. My code for hurdle model looks like this (x1, x2 continuous, x3 categorical):
m1 <- glmmTMB(count~x1+x2+x3+(1|year/class),
data = bd, zi = ~x2+x3+(1|year/class), family = truncated_nbinom2,
)
I used confint, and I got these results:
ci <- confint(m1,parm="beta_")
ci
2.5 % 97.5 % Estimate
cond.(Intercept) 1.816255e-01 0.448860094 0.285524861
cond.x1 9.045278e-01 0.972083366 0.937697401
cond.x2 1.505770e+01 26.817439186 20.094998772
cond.x3high 1.190972e+00 1.492335046 1.333164894
cond.x3low 1.028147e+00 1.215828654 1.118056377
cond.x3reg 1.135515e+00 1.385833853 1.254445909
class:year.cond.Std.Dev.(Intercept)2.256324e+00 2.662976154 2.441845815
year.cond.Std.Dev.(Intercept) 1.051889e+00 1.523719169 1.157153015
zi.(Intercept) 1.234418e-04 0.001309705 0.000402085
zi.x2 2.868578e-02 0.166378014 0.069084606
zi.x3high 8.972025e-01 1.805832900 1.272869874
Am I calculating the intervals correctly? Why is there only one category in x3 for zi?
If possible, I would also like to know if it's possible to plot these CIs.
Thanks!
Data looks like this:
class id year count x1 x2 x3
956 5 3002 2002 3 15.6 47.9 high
957 5 4004 2002 3 14.3 47.9 low
958 5 6021 2002 3 14.2 47.9 high
959 4 2030 2002 3 10.5 46.3 high
960 4 2031 2002 3 15.3 46.3 high
961 4 2034 2002 3 15.2 46.3 reg
with x1 and x2 continuous, x3 three level categorical variable (factor)
Summary of the model:
summary(m1)
'giveCsparse' has been deprecated; setting 'repr = "T"' for you'giveCsparse' has been deprecated; setting 'repr = "T"' for you'giveCsparse' has been deprecated; setting 'repr = "T"' for you
Family: truncated_nbinom2 ( log )
Formula: count ~ x1 + x2 + x3 + (1 | year/class)
Zero inflation: ~x2 + x3 + (1 | year/class)
Data: bd
AIC BIC logLik deviance df.resid
37359.7 37479.7 -18663.8 37327.7 13323
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
class:year(Intercept) 0.79701 0.8928
year (Intercept) 0.02131 0.1460
Number of obs: 13339, groups: class:year, 345; year, 15
Zero-inflation model:
Groups Name Variance Std.Dev.
dpto:year (Intercept) 1.024e+02 1.012e+01
year (Intercept) 7.842e-07 8.856e-04
Number of obs: 13339, groups: class:year, 345; year, 15
Overdispersion parameter for truncated_nbinom2 family (): 1.02
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.25343 0.23081 -5.431 5.62e-08 ***
x1 -0.06433 0.01837 -3.501 0.000464 ***
x2 3.00047 0.14724 20.378 < 2e-16 ***
x3high 0.28756 0.05755 4.997 5.82e-07 ***
x3low 0.11159 0.04277 2.609 0.009083 **
x3reg 0.22669 0.05082 4.461 8.17e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Zero-inflation model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.8188 0.6025 -12.977 < 2e-16 ***
x2 -2.6724 0.4484 -5.959 2.53e-09 ***
x3high 0.2413 0.1784 1.352 0.17635
x3low -0.1325 0.1134 -1.169 0.24258
x3reg -0.3806 0.1436 -2.651 0.00802 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
CI with broom.mixed
> broom.mixed::tidy(m1, effects="fixed", conf.int=TRUE)
# A tibble: 12 x 9
effect component term estimate std.error statistic p.value conf.low conf.high
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fixed cond (Intercept) -1.25 0.231 -5.43 5.62e- 8 -1.71 -0.801
2 fixed cond x1 -0.0643 0.0184 -3.50 4.64e- 4 -0.100 -0.0283
3 fixed cond x2 3.00 0.147 20.4 2.60e-92 2.71 3.29
4 fixed cond x3high 0.288 0.0575 5.00 5.82e- 7 0.175 0.400
5 fixed cond x3low 0.112 0.0428 2.61 9.08e- 3 0.0278 0.195
6 fixed cond x3reg 0.227 0.0508 4.46 8.17e- 6 0.127 0.326
7 fixed zi (Intercept) -9.88 1.32 -7.49 7.04e-14 -12.5 -7.30
8 fixed zi x1 0.214 0.120 1.79 7.38e- 2 -0.0206 0.448
9 fixed zi x2 -2.69 0.449 -6.00 2.01e- 9 -3.57 -1.81
10 fixed zi x3high 0.232 0.178 1.30 1.93e- 1 -0.117 0.582
11 fixed zi x3low -0.135 0.113 -1.19 2.36e- 1 -0.357 0.0878
12 fixed zi x4reg -0.382 0.144 -2.66 7.74e- 3 -0.664 -0.101
tl;dr as far as I can tell this is a bug in confint.glmmTMB (and probably in the internal function glmmTMB:::getParms). In the meantime, broom.mixed::tidy(m1, effects="fixed") should do what you want. (There's now a fix in progress in the development version on GitHub, should make it to CRAN sometime? soon ...)
Reproducible example:
set up data
set.seed(101)
n <- 1e3
bd <- data.frame(
year=factor(sample(2002:2018, size=n, replace=TRUE)),
class=factor(sample(1:20, size=n, replace=TRUE)),
x1 = rnorm(n),
x2 = rnorm(n),
x3 = factor(sample(c("low","reg","high"), size=n, replace=TRUE),
levels=c("low","reg","high")),
count = rnbinom(n, mu = 3, size=1))
fit
library(glmmTMB)
m1 <- glmmTMB(count~x1+x2+x3+(1|year/class),
data = bd, zi = ~x2+x3+(1|year/class), family = truncated_nbinom2,
)
confidence intervals
confint(m1, "beta_") ## wrong/ incomplete
broom.mixed::tidy(m1, effects="fixed", conf.int=TRUE) ## correct
You may want to think about which kind of confidence intervals you want:
Wald CIs (default) are much faster to compute and are generally OK as long as (1) your data set is large and (2) you aren't estimating any parameters on the log/logit scale that are near the boundaries
likelihood profile CIs are more accurate but much slower
how do I create a data.table in r with coefficient, std.err and Pvlaues with rqpd regression type? It's easy with the coefficients using summary(myregression)[2] but don't know how to get std.err and Pval. Thanks
Try with broom:
library(broom)
library(dplyr)
#Model
mod <- lm(Sepal.Length~.,data=iris)
#Broom
summaryobj <- tidy(mod)
Output:
# A tibble: 6 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 2.17 0.280 7.76 1.43e-12
2 Sepal.Width 0.496 0.0861 5.76 4.87e- 8
3 Petal.Length 0.829 0.0685 12.1 1.07e-23
4 Petal.Width -0.315 0.151 -2.08 3.89e- 2
5 Speciesversicolor -0.724 0.240 -3.01 3.06e- 3
6 Speciesvirginica -1.02 0.334 -3.07 2.58e- 3
Found a solution that is working
summ <- summary(myregression, se = "boot")
summ
str(summ)
PValues <- summ$coefficients[,4]
I have problem that I have been trying to solve for a couple of hours now but I simply can't figure it out (I'm new to R btw..).
Basically, what I'm trying to do (using mtcars to illustrate) is to make R test different independent variables (while adjusting for "cyl" and "disp") for the same independent variable ("mpg"). The best soloution I have been able to come up with is:
lm <- lapply(mtcars[,4:6], function(x) lm(mpg ~ cyl + disp + x, data = mtcars))
summary <- lapply(lm, summary)
... where 4:6 corresponds to columns "hp", "drat" and "wt".
This acutually works OK but the problem is that the summary appers with an "x" instead of for instace "hp":
$hp
Call:
lm(formula = mpg ~ cyl + disp + x, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.0889 -2.0845 -0.7745 1.3972 6.9183
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 34.18492 2.59078 13.195 1.54e-13 ***
cyl -1.22742 0.79728 -1.540 0.1349
disp -0.01884 0.01040 -1.811 0.0809 .
x -0.01468 0.01465 -1.002 0.3250
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.055 on 28 degrees of freedom
Multiple R-squared: 0.7679, Adjusted R-squared: 0.743
F-statistic: 30.88 on 3 and 28 DF, p-value: 5.054e-09
Questions:
Is there a way to fix this? And have I done this in the smartest way using lapply, or would it be better to use for instance for loops or other options?
Ideally, I would also very much like to make a table showing for instance only the estimae and P-value for each dependent variable. Can this somehow be done?
Best regards
One approach to get the name of the variable displayed in the summary is by looping over the names of the variables and setting up the formula using paste and as.formula:
lm <- lapply(names(mtcars)[4:6], function(x) {
formula <- as.formula(paste0("mpg ~ cyl + disp + ", x))
lm(formula, data = mtcars)
})
summary <- lapply(lm, summary)
summary
#> [[1]]
#>
#> Call:
#> lm(formula = formula, data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -4.0889 -2.0845 -0.7745 1.3972 6.9183
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 34.18492 2.59078 13.195 1.54e-13 ***
#> cyl -1.22742 0.79728 -1.540 0.1349
#> disp -0.01884 0.01040 -1.811 0.0809 .
#> hp -0.01468 0.01465 -1.002 0.3250
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 3.055 on 28 degrees of freedom
#> Multiple R-squared: 0.7679, Adjusted R-squared: 0.743
#> F-statistic: 30.88 on 3 and 28 DF, p-value: 5.054e-09
Concerning the second part of your question. One way to achieve this by making use of broom::tidy from the broom package which gives you a summary of regression results as a tidy dataframe:
lapply(lm, broom::tidy)
#> [[1]]
#> # A tibble: 4 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 34.2 2.59 13.2 1.54e-13
#> 2 cyl -1.23 0.797 -1.54 1.35e- 1
#> 3 disp -0.0188 0.0104 -1.81 8.09e- 2
#> 4 hp -0.0147 0.0147 -1.00 3.25e- 1
We could use reformulate to create the formula for the lm
lst1 <- lapply(names(mtcars)[4:6], function(x) {
fmla <- reformulate(c("cyl", "disp", x),
response = "mpg")
model <- lm(fmla, data = mtcars)
model$call <- deparse(fmla)
model
})
Then, get the summary
summary1 <- lapply(lst1, summary)
summary1[[1]]
#Call:
#"mpg ~ cyl + disp + hp"
#Residuals:
# Min 1Q Median 3Q Max
#-4.0889 -2.0845 -0.7745 1.3972 6.9183
#Coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 34.18492 2.59078 13.195 1.54e-13 ***
#cyl -1.22742 0.79728 -1.540 0.1349
#disp -0.01884 0.01040 -1.811 0.0809 .
#hp -0.01468 0.01465 -1.002 0.3250
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Residual standard error: 3.055 on 28 degrees of freedom
#Multiple R-squared: 0.7679, Adjusted R-squared: 0.743
#F-statistic: 30.88 on 3 and 28 DF, p-value: 5.054e-09