How to add interaction terms in multinomial regression - r

I am using the mlogit function from the mlogit package to run a multinomial logit regression. I am not sure how to add interaction terms into my model. Here is a toy dataset and my attempt to add interactions:
library(mlogit)
data <- data.frame(y=sample(1:3, 24, replace = TRUE),
x1 = c(rep(1,12), rep(2,12)),
x2 = rep(c(rep(1,4), rep(2,4), rep(3,4)),2),
x3=rnorm(24),
z1 = sample(1:10, 24, replace = TRUE))
m0 <- mlogit(y ~ 0|x1 + x2 + x3 + z1, shape = "wide", data = data) #model with only main effects
m1 <- mlogit(y ~ 0|(x1 + x2 + x3 + z1)^2, shape = "wide", data = data) #model assuming with all possible 2-way interactions?
The output from summary(m1) shows:
Coefficients :
Estimate Std. Error z-value Pr(>|z|)
(Intercept):2 86.41088 164.93831 0.5239 0.6003
(Intercept):3 62.43859 163.57346 0.3817 0.7027
x1:2 -32.27065 82.62474 -0.3906 0.6961
x1:3 0.24661 84.07429 0.0029 0.9977
x2:2 -75.09247 81.36496 -0.9229 0.3561
x2:3 -85.16452 81.40983 -1.0461 0.2955
x3:2 113.11778 119.15990 0.9493 0.3425
x3:3 112.77622 117.74567 0.9578 0.3382
z1:2 11.18665 22.32508 0.5011 0.6163
z1:3 13.15552 22.26441 0.5909 0.5546
x1:2 34.01298 39.66983 0.8574 0.3912
x1:3 32.19141 39.48373 0.8153 0.4149
x1:2 -53.86747 59.75696 -0.9014 0.3674
x1:3 -47.97693 59.09055 -0.8119 0.4168
x1:2 -6.98799 11.29920 -0.6185 0.5363
x1:3 -10.41574 11.52313 -0.9039 0.3660
x2:2 0.59185 6.68807 0.0885 0.9295
x2:3 2.63458 4.94419 0.5329 0.5941
x2:2 0.80945 2.03769 0.3972 0.6912
x2:3 2.60383 2.21878 1.1735 0.2406
x3:2 -0.64112 1.64678 -0.3893 0.6970
x3:3 -2.14289 1.98436 -1.0799 0.2802
the first column is not quite clear to me what specific interactions were outputted. Any pointers will be greatly appreciated!

This might be a clearer way to do it:
library(dplyr)
library(broom)
library(nnet)
multinom(formula = y ~ (x1 + x2 + x3 + z1)^2, data = data) %>%
tidy()
# A tibble: 22 x 6
y.level term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 2 (Intercept) -158. 247. -0.640 0.522
2 2 x1 -388. 247. -1.57 0.116
3 2 x2 -13.4 248. -0.0543 0.957
4 2 x3 120. 334. 0.360 0.719
5 2 z1 173. 968. 0.179 0.858
6 2 x1:x2 337. 248. 1.36 0.174
7 2 x1:x3 40.2 334. 0.120 0.904
8 2 x1:z1 -53.8 968. -0.0555 0.956
9 2 x2:x3 -137. 1018. -0.135 0.893
10 2 x2:z1 -76.6 910. -0.0841 0.933
# … with 12 more rows

Related

How can I access specific values of a lavaan model by code?

I have to run many CFAs and want to automate saving specific output values in a data frame so I can convert it to a latex table later.
Specifically I get my output something like this using lavaan:
model <- 'y =~ x1 + x2 + x3'
fit <- cfa(model, data)
sum <- summary(fit_os_bi, fit.measures=TRUE, standardized=T)
I managed to extract some values like this p_val <- sum$test$standard$pvalue but I couldn't figure out how to get to CFI, TLI, RMSEA, and SRMR. I think I'm even missing the right search terms to google that problem successfully.
How can I access these values from the summary-object? I'd be grateful if you could provide me with the right code or point me to a resource that explains it!
Here's an excerpt of the cfa summary:
lavaan 0.6-12 ended normally after 42 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 9
Number of observations 213
Model Test User Model:
Test statistic 1.625
Degrees of freedom 1
P-value (Chi-square) 0.202
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.997
Tucker-Lewis Index (TLI) 0.983
Root Mean Square Error of Approximation:
RMSEA 0.054
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.200
P-value RMSEA <= 0.05 0.315
Standardized Root Mean Square Residual:
SRMR 0.014
I looked at 'sum' in the environment inspector in R-Studio (where I found the location of the p-value) and searched the documentation of lavaan, but to no avail.
Since the values I'm looking for appear in the output I expect they must be stored somewhere in the summary-object.
lavaan has a number of helper functions to extract coefficients from the model object. In this case you can use fitMeasures():
fitMeasures(fit, c("pvalue", "cfi", "tli", "rmsea","srmr"))
Use broom::tidy() and broom::glance() to get model information in a data.frame object. I will use the standard example from lavaan package
library(lavaan)
#> This is lavaan 0.6-12
#> lavaan is FREE software! Please report any bugs.
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
fit <- cfa(HS.model, data = HolzingerSwineford1939)
broom::tidy(fit)
#> # A tibble: 24 × 9
#> term op estimate std.e…¹ stati…² p.value std.lv std.all std.nox
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 visual =~ x1 =~ 1 0 NA NA 0.900 0.772 0.772
#> 2 visual =~ x2 =~ 0.554 0.0997 5.55 2.80e- 8 0.498 0.424 0.424
#> 3 visual =~ x3 =~ 0.729 0.109 6.68 2.31e-11 0.656 0.581 0.581
#> 4 textual =~ x4 =~ 1 0 NA NA 0.990 0.852 0.852
#> 5 textual =~ x5 =~ 1.11 0.0654 17.0 0 1.10 0.855 0.855
#> 6 textual =~ x6 =~ 0.926 0.0554 16.7 0 0.917 0.838 0.838
#> 7 speed =~ x7 =~ 1 0 NA NA 0.619 0.570 0.570
#> 8 speed =~ x8 =~ 1.18 0.165 7.15 8.56e-13 0.731 0.723 0.723
#> 9 speed =~ x9 =~ 1.08 0.151 7.15 8.40e-13 0.670 0.665 0.665
#> 10 x1 ~~ x1 ~~ 0.549 0.114 4.83 1.34e- 6 0.549 0.404 0.404
#> # … with 14 more rows, and abbreviated variable names ¹​std.error, ²​statistic
broom::glance(fit)
#> # A tibble: 1 × 17
#> agfi AIC BIC cfi chisq npar rmsea rmsea.conf.h…¹ srmr tli conve…²
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>
#> 1 0.894 7517. 7595. 0.931 85.3 21 0.0921 0.114 0.0652 0.896 TRUE
#> # … with 6 more variables: estimator <chr>, ngroups <int>,
#> # missing_method <chr>, nobs <int>, norig <int>, nexcluded <int>, and
#> # abbreviated variable names ¹​rmsea.conf.high, ²​converged
Created on 2023-01-26 with reprex v2.0.2

Loop in Cox regression

I am trying to run a cox regression for 1000 variables (exposure) as below
varlist <- names(dataset)[275:1275]
sumtables <- lapply(varlist, function(i) {
iformula <- as.formula(sprintf("Surv(time_cox, events) ~ %s + age +age2 ", i))
x <- coxph(iformula, data=dataset, na.action=na.omit)
summary(x)[7][[1]] ##### summary(x)[8][[1]]
})
it works well, but I don't know how to extract the data (for each variable (beta and se)) and run the benjamini-hochberg on p-values.
any help is appreciated! Thanks
I am assuming here that all the variables in varlist are either binary or numeric.
sumtables <- lapply(varlist, function(i) {
iformula <- as.formula(sprintf("Surv(time_cox, events) ~ %s + age +age2 ", i))
x <- coxph(iformula, data=dataset, na.action=na.omit)
data.frame(pvalue = drop1(x, scope = i, test = "Chisq")[2,4],
coef = coef(x)[i])
})
You could use purrr::map to get a tidy dataframe of all your coefficients, se's and p values etc. from the vector of tested exposures. Modifying a little from your code above to work with veteran dataset:
library(survival)
library(tidyverse)
exp_vars <- names(veteran[, c(1, 2, 5, 6, 8)])
tibble(exp_vars) %>%
group_by(exp_vars) %>%
mutate(cox_mod = map(exp_vars, function(exposure) {
iformula <-
as.formula(sprintf("Surv(time, status) ~ %s + age", exposure))
x <- coxph(iformula, data = veteran, na.action = na.omit)
x
}),
coefs = list(rownames_to_column(data.frame(
summary(cox_mod[[1]])$coefficients
)))) %>%
unnest(coefs)
#> # A tibble: 12 x 8
#> # Groups: exp_vars [5]
#> exp_vars cox_mod rowname coef exp.coef. se.coef. z Pr...z..
#> <chr> <list> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 trt <coxph> trt -0.00365 0.996 0.183 -0.0200 9.84e- 1
#> 2 trt <coxph> age 0.00753 1.01 0.00966 0.779 4.36e- 1
#> 3 celltype <coxph> celltypesmallc~ 0.992 2.70 0.254 3.91 9.40e- 5
#> 4 celltype <coxph> celltypeadeno 1.16 3.17 0.293 3.94 8.07e- 5
#> 5 celltype <coxph> celltypelarge 0.235 1.27 0.278 0.848 3.97e- 1
#> 6 celltype <coxph> age 0.00590 1.01 0.00935 0.631 5.28e- 1
#> 7 karno <coxph> karno -0.0337 0.967 0.00520 -6.48 8.94e-11
#> 8 karno <coxph> age -0.00239 0.998 0.00908 -0.263 7.92e- 1
#> 9 diagtime <coxph> diagtime 0.00943 1.01 0.00892 1.06 2.90e- 1
#> 10 diagtime <coxph> age 0.00797 1.01 0.00961 0.830 4.07e- 1
#> 11 prior <coxph> prior -0.0135 0.987 0.0201 -0.674 5.00e- 1
#> 12 prior <coxph> age 0.00715 1.01 0.00955 0.749 4.54e- 1
Created on 2022-03-16 by the reprex package (v2.0.1)

How to write a function that will run multiple regression models of the same type with different dependent variables and then store them as lists?

I am trying to write a function that will run multiple regressions and then store the outputs in a vector. What I want is for the function to pick the dependent variables from a list that I will provide, and then run the regressions on the same right hand-side variables. Not sure how to go about doing this. Any hints will be appreciated.
my_data <- data.frame(x1=(1:10) + rnorm(10, 3, 1.5), x2=25/3 + rnorm(10, 0, 1),
dep.var1=seq(5, 28, 2.5), dep.var2=seq(100, -20, -12.5),
dep.var3=seq(1, 25, 2.5))
## The following is a list that tells the function
dep.var <- list(dep.var1=my_data$dep.var1, dep.var2=my_data$dep.var2)
## which dependent variables to use from my_data
all_models <- function(dep.var){lm(dep.var ~ x1 + x2, data=my_data)}
model <- sapply(dep.var, all_models) ## The "sapply" here tells the function to
## take the dependent variables from the list dep.var.
I want the "model" list to have two objects: model1 with dep.var1 and model2 with dep.var2. Then as required, I will use summary(model#) to see the regression output.
I know that this in theory works when a vector is used (i.e., p):
p <- seq(0.25, 0.95, 0.05)
s <- function(p) {1 - pnorm(35, p*1*44, sqrt(44)*sqrt(p*(1 - p)))}
f <- sapply(p, s)
But I can't get the whole thing to work as required for my regression models. It works somewhat because you can run and check "model" and it will show you the two regression outputs - but it is horrible. And the "model" does not show the regression specification, i.e., dep.var1 ~ x1 + x2.
Consider reformulate to dynamically change model formulas using character values for lm calls:
# VECTOR OF COLUMN NAMES (NOT VALUES)
dep.vars <- c("dep.var1", "dep.var2")
# USER-DEFINED METHOD TO PROCESS DIFFERENT DEP VAR
run_model <- function(dep.var) {
fml <- reformulate(c("x1", "x2"), dep.var)
lm(fml, data=data)
}
# NAMED LIST OF MODELS
all_models <- sapply(dep.vars, run_model, simplify = FALSE)
# OUTPUT RESULTS
all_models$dep.var1
all_models$dep.var2
...
From there, you can run further extractions or processes across model objects:
# NAMED LIST OF MODEL SUMMARIES
all_summaries <- lapply(all_models, summary)
all_summaries$dep.var1
all_summaries$dep.var2
...
# NAMED LIST OF MODEL COEFFICIENTS
all_coefficients <- lapply(all_models, `[`, "coefficients")
all_coefficients$dep.var1
all_coefficients$dep.var2
...
You could sapply over the names of the dependent vars which you could nicely identify with grep. In lm use reformulate to build the formula.
sapply(grep('^dep', names(my_data), value=TRUE), \(x)
lm(reformulate(c('x1', 'x2'), x), my_data))
# dep.var1 dep.var2 dep.var3
# coefficients numeric,3 numeric,3 numeric,3
# residuals numeric,10 numeric,10 numeric,10
# effects numeric,10 numeric,10 numeric,10
# rank 3 3 3
# fitted.values numeric,10 numeric,10 numeric,10
# assign integer,3 integer,3 integer,3
# qr qr,5 qr,5 qr,5
# df.residual 7 7 7
# xlevels list,0 list,0 list,0
# call expression expression expression
# terms dep.var1 ~ x1 + x2 dep.var2 ~ x1 + x2 dep.var3 ~ x1 + x2
# model data.frame,3 data.frame,3 data.frame,3
The dep.var* appear nicely in the result.
However, you probably want to use lapply and pipe it into setNames() to get the list elements named. Instead of grep you may of course define the dependent variables manually. To get a clean formular call stored, we use a trick once #g-grothendieck taught me using do.call.
dv <- as.list(grep('^dep', names(my_data), value=TRUE)[1:2])
res <- lapply(dv, \(x) {
f <- reformulate(c('x1', 'x2'), x)
do.call('lm', list(f, quote(my_data)))
}) |>
setNames(dv)
res
# $dep.var1
#
# Call:
# lm(formula = dep.var1 ~ x1 + x2, data = my_data)
#
# Coefficients:
# (Intercept) x1 x2
# -4.7450 2.3398 0.2747
#
#
# $dep.var2
#
# Call:
# lm(formula = dep.var2 ~ x1 + x2, data = my_data)
#
# Coefficients:
# (Intercept) x1 x2
# 148.725 -11.699 -1.373
This allows you to get the summary() of the objects, which probably is what you want.
summary(res$dep.var1)
# Call:
# lm(formula = dep.var1 ~ x1 + x2, data = my_data)
#
# Residuals:
# Min 1Q Median 3Q Max
# -2.8830 -1.8345 -0.2326 1.4335 4.2452
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -4.7450 7.2884 -0.651 0.536
# x1 2.3398 0.2836 8.251 7.48e-05 ***
# x2 0.2747 0.7526 0.365 0.726
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 2.55 on 7 degrees of freedom
# Multiple R-squared: 0.9117, Adjusted R-squared: 0.8865
# F-statistic: 36.14 on 2 and 7 DF, p-value: 0.0002046
Finally you could wrap it in a function
calc_models <- \(dv) {
lapply(dv, \(x) {
f <- reformulate(c('x1', 'x2'), x)
do.call('lm', list(f, quote(my_data)))
}) |>
setNames(dv)
}
calc_models(list('dep.var1', 'dep.var2'))
Here is a way how you could iterate through your dataframe and apply the function to the group you define (here dep.var) and save the different models in a dataframe:
library(tidyverse)
library(broom)
my_data %>%
pivot_longer(
starts_with("dep"),
names_to = "group",
values_to = "dep.var"
) %>%
mutate(group = as.factor(group)) %>%
group_by(group) %>%
group_split() %>%
map_dfr(.f = function(df) {
lm(dep.var ~ x1 + x2, data = df) %>%
tidy() %>% # first output
#glance() %>% # second output
add_column(group = unique(df$group), .before=1)
})
dataframe output:
# A tibble: 9 x 6
group term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 dep.var1 (Intercept) -5.29 11.6 -0.456 0.662
2 dep.var1 x1 2.11 0.268 7.87 0.000101
3 dep.var1 x2 0.538 1.23 0.437 0.675
4 dep.var2 (Intercept) 151. 57.9 2.61 0.0347
5 dep.var2 x1 -10.6 1.34 -7.87 0.000101
6 dep.var2 x2 -2.69 6.15 -0.437 0.675
7 dep.var3 (Intercept) -9.29 11.6 -0.802 0.449
8 dep.var3 x1 2.11 0.268 7.87 0.000101
9 dep.var3 x2 0.538 1.23 0.437 0.675
list output:
[[1]]
# A tibble: 3 x 6
group term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 dep.var1 (Intercept) -5.29 11.6 -0.456 0.662
2 dep.var1 x1 2.11 0.268 7.87 0.000101
3 dep.var1 x2 0.538 1.23 0.437 0.675
[[2]]
# A tibble: 3 x 6
group term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 dep.var2 (Intercept) 151. 57.9 2.61 0.0347
2 dep.var2 x1 -10.6 1.34 -7.87 0.000101
3 dep.var2 x2 -2.69 6.15 -0.437 0.675
[[3]]
# A tibble: 3 x 6
group term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 dep.var3 (Intercept) -9.29 11.6 -0.802 0.449
2 dep.var3 x1 2.11 0.268 7.87 0.000101
3 dep.var3 x2 0.538 1.23 0.437 0.675
glance output:
group r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 dep.var1 0.927 0.906 2.32 44.3 0.000106 2 -20.8 49.7 50.9 37.8 7 10
2 dep.var2 0.927 0.906 11.6 44.3 0.000106 2 -36.9 81.9 83.1 944. 7 10
3 dep.var3 0.927 0.906 2.32 44.3 0.000106 2 -20.8 49.7 50.9 37.8 7 10

Many regressions using tidyverse and broom: Same dependent variable, different independent variables

This link shows how to answer my question in the case where we have the same independent variables, but potentially many different dependent variables: Use broom and tidyverse to run regressions on different dependent variables.
But my question is, how can I apply the same approach (e.g., tidyverse and broom) to run many regressions where we have the reverse situation: same dependent variables but different independent variable. In line with the code in the previous link, something like:
mod = lm(health ~ cbind(sex,income,happiness) + faculty, ds) %>% tidy()
However, this code does not do exactly what I want, and instead, produces:
Call:
lm(formula = income ~ cbind(sex, health) + faculty, data = ds)
Coefficients:
(Intercept) cbind(sex, health)sex
945.049 -47.911
cbind(sex, health)health faculty
2.342 1.869
which is equivalent to:
lm(formula = income ~ sex + health + faculty, data = ds)
Basically you'll need some way to create all the different formulas you want. Here's one way
qq <- expression(sex,income,happiness)
formulae <- lapply(qq, function(v) bquote(health~.(v)+faculty))
# [[1]]
# health ~ sex + faculty
# [[2]]
# health ~ income + faculty
# [[3]]
# health ~ happiness + faculty
Once you have all your formula, you can map them to lm and then to tidy()
library(purrr)
library(broom)
formulae %>% map(~lm(.x, ds)) %>% map_dfr(tidy, .id="model")
# A tibble: 9 x 6
# model term estimate std.error statistic p.value
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 1 (Intercept) 19.5 0.504 38.6 1.13e-60
# 2 1 sex 0.755 0.651 1.16 2.49e- 1
# 3 1 faculty -0.00360 0.291 -0.0124 9.90e- 1
# 4 2 (Intercept) 19.8 1.70 11.7 3.18e-20
# 5 2 income -0.000244 0.00162 -0.150 8.81e- 1
# 6 2 faculty 0.143 0.264 0.542 5.89e- 1
# 7 3 (Intercept) 18.4 1.88 9.74 4.79e-16
# 8 3 happiness 0.205 0.299 0.684 4.96e- 1
# 9 3 faculty 0.141 0.262 0.539 5.91e- 1
Using sample data
set.seed(11)
ds <- data.frame(income = rnorm(100, mean=1000,sd=200),
happiness = rnorm(100, mean = 6, sd=1),
health = rnorm(100, mean=20, sd = 3),
sex = c(0,1),
faculty = c(0,1,2,3))
You could use the combn function to get all combinations of n independent variables and then iterate over them. Let's say n=3 here:
library(tidyverse)
ds <- data.frame(income = rnorm(100, mean=1000,sd=200),
happiness = rnorm(100, mean = 6, sd=1),
health = rnorm(100, mean=20, sd = 3),
sex = c(0,1),
faculty = c(0,1,2,3))
ivs = combn(names(ds)[names(ds)!="income"], 3, simplify=FALSE)
# Or, to get all models with 1 to 4 variables:
# ivs = map(1:4, ~combn(names(ds)[names(ds)!="income"], .x, simplify=FALSE)) %>%
# flatten()
names(ivs) = map(ivs, ~paste(.x, collapse="-"))
models = map(ivs,
~lm(as.formula(paste("income ~", paste(.x, collapse="+"))), data=ds))
map_df(models, broom::tidy, .id="model")
model term estimate std.error statistic p.value
* <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 happiness-health-sex (Intercept) 1086. 201. 5.39 5.00e- 7
2 happiness-health-sex happiness -25.4 21.4 -1.19 2.38e- 1
3 happiness-health-sex health 3.58 6.99 0.512 6.10e- 1
4 happiness-health-sex sex 11.5 41.5 0.277 7.82e- 1
5 happiness-health-faculty (Intercept) 1085. 197. 5.50 3.12e- 7
6 happiness-health-faculty happiness -25.8 20.9 -1.23 2.21e- 1
7 happiness-health-faculty health 3.45 6.98 0.494 6.23e- 1
8 happiness-health-faculty faculty 7.86 18.2 0.432 6.67e- 1
9 happiness-sex-faculty (Intercept) 1153. 141. 8.21 1.04e-12
10 happiness-sex-faculty happiness -25.9 21.4 -1.21 2.28e- 1
11 happiness-sex-faculty sex 3.44 46.2 0.0744 9.41e- 1
12 happiness-sex-faculty faculty 7.40 20.2 0.366 7.15e- 1
13 health-sex-faculty (Intercept) 911. 143. 6.35 7.06e- 9
14 health-sex-faculty health 3.90 7.03 0.554 5.81e- 1
15 health-sex-faculty sex 15.6 45.6 0.343 7.32e- 1
16 health-sex-faculty faculty 7.02 20.4 0.345 7.31e- 1

How can I export correspondent statistics from linear regression model to a table in R?

I am fitting a multiple linear regression model with 1 dependent variable and 45 independent variables. I would like to put the correspondent effect, 95% CI and p-value to table.
My data set is as below:
a b x1 x2 x3 x4 .... x45
23 15 1 34 4 45 8
10 45 2 15 2 55 18
Is there any more convenient way to write the regression syntax rather than writing lm(y ~ x1 + x2 + x3 + x4...+x45, data=df) as I have 45 variables with different names?
I've tried
mod1 <- lm(a ~ x1 + x2 + ....+x45, data=df)
mod2 <- lm(b ~ x1 + x2 + .... +x45, data=df)
But I don't know how to put the outcomes into table as following:
model variable effect 95%CI p
mod1 x1 177.93 79.16- 276.71 0.003
mod1 x2 -75.13 -116.46 - -33.8 0.003
...
mod1 x45 118.61 53.09-184.13 0.005
mod2 x1 79.53 36.94 - 122.13 0.004
mod2 x2 201.93 60.48 - 343.38 0.01
...
mod2 x45 61.56 20.87 - 102.25 0.005
It would be great if anyone can help. Many thanks!
Assuming that your dataset contains only the dependent variable and the relevant predictors, you can specify the regression like this:
lm(a ~ ., data = df)
In your case you could use dplyr::select to remove b or a:
library(dplyr
mod1 <- df %>%
select(-b) %>%
lm(a ~ ., data = .)
To summarise outcomes in a table, look at the R packages stargazer or sjPlot or broom. I like sjPlot::tab_model as it generates clean HTML tables in RStudio or in RMarkdown documents.
library(sjPlot)
iris %>%
lm(Petal.Length ~ ., data = .) %>%
tab_model()
Or using broom::tidy:
library(broom)
iris %>%
lm(Petal.Length ~ ., data = .) %>%
tidy()
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -1.11 0.270 -4.12 6.45e- 5
2 Sepal.Length 0.608 0.0502 12.1 1.07e-23
3 Sepal.Width -0.181 0.0804 -2.25 2.62e- 2
4 Petal.Width 0.602 0.121 4.96 1.97e- 6
5 Speciesversicolor 1.46 0.173 8.44 3.14e-14
6 Speciesvirginica 1.97 0.245 8.06 2.60e-13
When I used lmer function for the regression, this code can extract the table of the results. Please refer the code!
RESULT_model1 <- summary(model1<-lmer(LOGTHSMON_SELNG_AMT~ .-BLK_CD-LOGPERTHSMON_SELNG_AMT
- STDR_YM_CD - bub_dong-M201507-M201508-M201509
-STOR_CO
+(1|BLK_CD)
,data=dta_fin_WEST))
RESULT_model1_cof <- as.data.frame(tidy(RESULT_model1$coefficients))
RESULT_model1_cof$p.value <- 2 * (1 - pnorm(abs(RESULT_model1_cof$t.value)))
RESULT_model1_cof$VAR <- RESULT_model1_cof$.rownames
RESULT_model1_cof$STDERR <- RESULT_model1_cof$Std..Error

Resources