I would like to create a regression table in R Markdown that includes the exponentiated coefficients, exponentiated upper and lower 95% confidence intervals and p-value for each variable in a logistic regression model.
```{r}
#basic table
library(knitr)
x1 <- rnorm(100,0,1)
x2 <- rpois(100,5)
y1 <- rbinom(100,1,0.33)
df <- data.frame(x1,x2,y1)
modelx <- glm(y1 ~ x1 + x2, data = df ,family = "binomial")
kable(summary(modelx)$coef)
#maneuvers to obtain OR and 95% CI
orx <- exp(c(OR = coef(modelx), confint(modelx)))
kable(orx)
```
I've tried as above, which creates the values, but loses the labels and general kable-friendly form. I have tried directly replacing the exponentiated coefficients into the glm object modelx. However, this object cannot easily hold the 95% CI.
Is there a simple way to accomplish this task without manually building the table in tables or other kable-friendly package?
The process requires transforming the summary(modelx) to a data.frame, carry out the necessary calculations while adding results to the data.frame and finally porting to kable with kableExtra options to finish the formatting.
```{r results='asis', echo=FALSE}
#basic table
library(knitr)
library(kableExtra)
x1 <- rnorm(100,0,1)
x2 <- rpois(100,5)
y1 <- rbinom(100,1,0.33)
df <- data.frame(x1,x2,y1)
modelx <- glm(y1 ~ x1 + x2, data = df ,family = "binomial")
tableit <- data.frame(summary(modelx)$coef)
tableit$OR <- exp(tableit$Estimate)
tableit$LCL <- exp(tableit$Estimate - tableit$Std..Error * 1.96 )
tableit$UCL <- exp(tableit$Estimate + tableit$Std..Error * 1.96 )
tableit$`p-value` <- tableit$Pr...z..
tableit <- tableit[c(5,6,7,8)]
kable(tableit, digits = 2, align = rep('c',4 )) %>%
kable_styling(bootstrap_options = "striped", full_width = F)
```
Related
I am new to R and am trying to loop a mixed model across 90 columns in a dataset.
My dataset looks like the following one but has 90 predictors instead of 7 that I need to evaluate as fixed effects in consecutive models.
I then need to store the model output (coefficients and P values) to finally construct a figure summarizing the size effects of each predictor. I know the discussion of P value estimates from lme4 mixed models.
For example:
set.seed(101)
mydata <- tibble(id = rep(1:32, times=25),
time = sample(1:800),
experiment = rep(1:4, times=200),
Y = sample(1:800),
predictor_1 = runif(800),
predictor_2 = rnorm(800),
predictor_3 = sample(1:800),
predictor_4 = sample(1:800),
predictor_5 = seq(1:800),
predictor_6 = sample(1:800),
predictor_7 = runif(800)) %>% arrange (id, time)
The model to iterate across the N predictors is:
library(lme4)
library(lmerTest) # To obtain new values
mixed.model <- lmer(Y ~ predictor_1 + time + (1|id) + (1|experiment), data = mydata)
summary(mixed.model)
My coding skills are far from being able to set a loop to repeat the model across the N predictors in my dataset and store the coefficients and P values in a dataframe.
I have been able to iterate across all the predictors fitting linear models instead of mixed models using lapply. But I have failed to apply this strategy with mixed models.
varlist <- names(mydata)[5:11]
lm_models <- lapply(varlist, function(x) {
lm(substitute(Y ~ i, list(i = as.name(x))), data = mydata)
})
One option is to update the formula of a restricted model (w/o predictor) in an lapply loop over the predictors. Then summaryze the resulting list and subset the coefficient matrix using a Vectorized function.
library(lmerTest)
mixed.model <- lmer(Y ~ time + (1|id) + (1|experiment), data = mydata)
preds <- grep('pred', names(mydata), value=TRUE)
fits <- lapply(preds, \(x) update(mixed.model, paste('. ~ . + ', x)))
extract_coef_p <- Vectorize(\(x) x |> summary() |> coef() |> {\(.) .[3, c(1, 5)]}())
res <- `rownames<-`(t(extract_coef_p(fits)), preds)
res
# Estimate Pr(>|t|)
# predictor_1 -7.177579138 0.8002737
# predictor_2 -5.010342111 0.5377551
# predictor_3 -0.013030513 0.7126500
# predictor_4 -0.041702039 0.2383835
# predictor_5 -0.001437124 0.9676346
# predictor_6 0.005259293 0.8818644
# predictor_7 31.304496255 0.2511275
I am trying to produce a nice regression table for marginal effects & p-values from the probitmfx function, where p-values are reported under the marginal effect per covariate. An picture example of what I'd like it to look like is here Similar Output from Stata.
I tried the stargazer function, as suggested here but this does not seem to work if I don't have an OLS / probit.
data_T1 <- read_dta("xxx")
#specification (1)
T1_1 <- probitmfx(y ~ x1 + x2 + x3, data=data_T1)
#specification (1)
T1_2 <- probitmfx(y ~ x1 + x2 + x3 + x4 + x5, data=data_T1)
#this is what I tried but does not work
table1 <- stargazer(coef=list(T1_1$mfxest[,1], T1_2$mfxest[,1]),
p=list(T1_2$mfxest[,4],T1_2$mfxest[,4]), type="text")
Any suggestions how I can design such a table in R?
You can probably use parameters package to produce a beautiful table:
Code:
library(mfx)
library(parameters)
# simulate some data
set.seed(12345)
n <- 1000
x <- rnorm(n)
# binary outcome
y <- ifelse(pnorm(1 + 0.5 * x + rnorm(n)) > 0.5, 1, 0)
data <- data.frame(y, x)
mod <- probitmfx(formula = y ~ x, data = data)
print_html(model_parameters(mod))
HTML table to be used in Rmarkdown:
The question is as it sounds; at present I use the broom package to tidy up my regression summary and then use write_csv to turn that summary into a csv naturally. However, the problem is, is that this 'tidied' summary doesn't contain useful stats like R Squared, Residual distribution and p value from the F Statistic.
Does anyone know how to write up a regression summary into a csv which would contain this useful information?
Thanks.
An alternative would be to create a function that stores in a list all the information you need as follows:
lm(mpg ~ cyl, mtcars) -> model
model_stats <- function(model) {
data.frame(model = toString(model$call),
broom::augment(model)) -> info1
data.frame(model = toString(model$call),
broom::tidy(model)) -> info2
data.frame(model = toString(model$call),
broom::glance(model)) -> info3
list(info1, info2, info3) -> info_all
return(info_all)
}
out <- model_stats(model)
sapply(seq_along(out), function(i) write.csv(model_stats(model)[[i]], paste0('info', i, '.csv')))
I assume, the core issue of your question is that you don't handle all summary tables as data.frames or list (by compiling all information), aren't you!?
So if you just want to write a particular statistics summary (aov, TukeyHSD, augmented, glance etc.) in csv, you should change it to a data.frame
Some example from "broom" vignette: https://cran.r-project.org/web/packages/broom/vignettes/broom.html
glmfit <- glm(am ~ wt, mtcars, family = "binomial")
tidy(glmfit)
fit1 <- as.data.frame(augment(glmfit))
write.csv(fit1, "test.csv")
The value returned by lm is an object which can be further processed using summary. From the value returned by summary you can access different information and process it manually.
# Data and model fit
df <- data.frame(a = rnorm(100), b = rnorm(100))
mod <- lm(a~b, data = df)
su <- summary(mod)
# Helper function to create the output
fill <- function(row, mat) {
c(row, rep("", ncol(mat)-length(row)))
}
# Create the output. `su$coefficients` provides the matrix to which more information is added using `rbind`.
output <- rbind(
su$coefficients,
`Additional stats` = fill("", output),
Rsq = fill(su$r.squared, output),
Adj.Rsq = fill(su$adj.r.squared, output),
F = fill(su$fstatistic, output))
# Write to disk with row and colnames (col.names = NA)
write.table(output, file="model-summary.csv", sep = ",", col.names=NA)
I've just noticed that sjt.lmer tables are displaying incorrect p-values, e.g., p-values that do not reflect the model summary. This appears to be a new-ish issue, as this worked fine last month?
Using the provided data and code in the package vignette
library(sjPlot)
library(sjmisc)
library(sjlabelled)
library(lme4)
library(sjstats)
load sample data
data(efc)
prepare grouping variables
efc$grp = as.factor(efc$e15relat)
levels(x = efc$grp) <- get_labels(efc$e15relat)
efc$care.level <- rec(efc$n4pstu, rec = "0=0;1=1;2=2;3:4=4",
val.labels = c("none", "I", "II", "III"))
data frame for fitted model
mydf <- data.frame(
neg_c_7 = efc$neg_c_7,
sex = to_factor(efc$c161sex),
c12hour = efc$c12hour,
barthel = efc$barthtot,
education = to_factor(efc$c172code),
grp = efc$grp,
carelevel = to_factor(efc$care.level)
)
fit sample models
fit1 <- lmer(neg_c_7 ~ sex + c12hour + barthel + (1 | grp), data = mydf)
summary(fit1)
p_value(fit1, p.kr =TRUE)
model summary
p_value summary
sjt.lmer output does not show these p-values??
Note that the first summary comes from a model fitted with lmerTest, which computes p-values with df based on Satterthwaite approximation (see first line in output).
p_value(), however, with p.kr = TRUE, uses the Kenward-Roger approximation from package pbkrtest, which is a bit more conservative.
Your output from sjt.lmer() seems to be messed up somehow, and I can't reproduce it with your example. My output looks ok:
I run a nonparametric regression using the np package (npreg) and try to plot my results for the variable of interest x1 holding all other variables at their means/modes.
library("np")
y <- rnorm(100)
x1 <- rnorm(100,10,30)
x2 <- rbinom(100,1,0.5)
x3 <- rbinom(100,1,0.5)
model.np <- npreg(y ~ x1 + x2 + x3)
plot(model.np)
The plots are exactly what I want but I cannot figure out how to generate them separately "by hand". In particular, I only want the first (of the three) output plots.
Apparantly, a detailed answer can be found in the help file for the npplot-routine with plot.behavior being the crucial argument.
For my example, plotting only the x1-graph could be done via:
nlmodel.plot <- plot(model.np, plot.behavior="data")
y.eval <- fitted(nlmodel.plot$r1) #fit partial regression model for r1=airnoise
y.se <- se(nlmodel.plot$r1) #grab SE from botstrap
y.lower.ci <- y.eval + logp.se[,1] #lower CI
y.upper.ci <- y.eval + logp.se[,2] #upper CI
x1.eval <- nlmodel.plot$r1$eval[,1] #grab x1 values saved in plot$r1
plot(x1,y)
lines(x1.eval,y.eval)
lines(x1.eval,y.lower.ci,lty=3)
lines(x1.eval,y.upper.ci,lty=3)