Is it possible to have gtsummary::tbl_summary compute the mean excluding outliers? For example in the following code I present sample data of some z-scores. Is it possible to specify what, or add a clause, to how gtsummary::tbl_summary handles each column?
set.seed(42)
n <- 1000
dat <- data.frame(id=1:n,
treat = factor(sample(c('Treat','Control'), n, rep=TRUE, prob=c(.5, .5))),
outcome1=runif(n, min=-3.6, max=2.3),
outcome2=runif(n, min=-1.9, max=3.3),
outcome3=runif(n, min=-2.5, max=2.8),
outcome4=runif(n, min=-3.1, max=2.2))
dat %>% select(-c(id)) %>% tbl_summary(by=treat, statistic = list(all_continuous() ~ "{mean} ({min} to {max})"))
For example, suppose I want the table to report the mean of outcome1only in cases where outcome1 >= -2.9 and for outcome2 only when cases are outcome2 < 3.0 etc.
Many thanks in advance for any guidance offered.
You can define a new mean function that excludes outlying values. You can define the outlier in any way you'd like. Then pass that function to tbl_summary(). Example below!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.5.2'
set.seed(42)
n <- 1000
dat <- data.frame(id=1:n,
treat = factor(sample(c('Treat','Control'), n, rep=TRUE, prob=c(.5, .5))),
outcome1=runif(n, min=-3.6, max=2.3),
outcome2=runif(n, min=-1.9, max=3.3),
outcome3=runif(n, min=-2.5, max=2.8),
outcome4=runif(n, min=-3.1, max=2.2))
mean_no_extreme <- function(x) {
x <- na.omit(x)
sd <- sd(x)
mean <- mean(x)
# calculate mean excluding extremes
mean(x[x >= mean - sd * 3 & x <= mean + sd * 3])
}
dat %>%
select(-c(id)) %>%
tbl_summary(
by=treat,
statistic = all_continuous() ~ "{mean_no_extreme} ({min} to {max})"
) %>%
as_kable()
Characteristic
Control, N = 527
Treat, N = 473
outcome1
-0.64 (-3.59 to 2.30)
-0.70 (-3.60 to 2.30)
outcome2
0.68 (-1.89 to 3.30)
0.78 (-1.87 to 3.28)
outcome3
0.20 (-2.47 to 2.80)
0.23 (-2.48 to 2.80)
outcome4
-0.36 (-3.09 to 2.19)
-0.41 (-3.10 to 2.20)
Created on 2022-03-22 by the reprex package (v2.0.1)
Related
I made a nls loop and get values calculated in console. Now I want to extract those values, specify which values are from which group and put everything in a dataframe to continue working.
my loop so far:
for (i in seq_along(trtlist2)) { loopmm.nls <-
nls(rate ~ (Vmax * conc /(Km + conc)),
data=subset(M3, M3$trtlist==trtlist2[i]),
start=list(Km=200, Vmax=2), trace=TRUE )
summary(loopmm.nls)
print(summary(loopmm.nls))
}
the output in console: (this is what I want to extract and put in a dataframe, I have this same "parameters" thing like 20 times)
Parameters:
Estimate Std. Error t value Pr(>|t|)
Km 23.29820 9.72304 2.396 0.0228 *
Vmax 0.10785 0.01165 9.258 1.95e-10 ***
---
different ways of extracting data from the console that work but not in the loop (so far!)
#####extract data in diff ways from nls#####
## extract coefficients as matrix
Kinall <- summary(mm.nls)$parameters
## extract coefficients save as dataframe
Kin <- as.data.frame(Kinall)
colnames(Kin) <- c("values", "SE", "T", "P")
###create Km Vmax df
Kms <- Kin[1, ]
Vmaxs <- Kin[2, ]
#####extract coefficients each manually
Km <- unname(coef(summary(mm.nls))["Km", "Estimate"])
Vmax <- unname(coef(summary(mm.nls))["Vmax", "Estimate"])
KmSE <- unname(coef(summary(mm.nls))["Km", "Std. Error"])
VmaxSE <- unname(coef(summary(mm.nls))["Vmax", "Std. Error"])
KmP <- unname(coef(summary(mm.nls))["Km", "Pr(>|t|)"])
VmaxP <- unname(coef(summary(mm.nls))["Vmax", "Pr(>|t|)"])
KmT <- unname(coef(summary(mm.nls))["Km", "t value"])
VmaxT <- unname(coef(summary(mm.nls))["Vmax", "t value"])
one thing that works if you extract data through append, but somehow that only works for "estimates" not the rest
Kms <- append(Kms, unname(coef(loopmm.nls)["Km"] ))
Vmaxs <- append(Vmaxs, unname(coef(loopmm.nls)["Vmax"] ))
}
Kindf <- data.frame(trt = trtlist2, Vmax = Vmaxs, Km = Kms)
I would just keep everything in the dataframe for ease. You can nest by the group and then run the regression then pull the coefficients out. Just make sure you have tidyverse and broom installed on your computer.
library(tidyverse)
#example
mtcars |>
nest(data = -cyl) |>
mutate(model = map(data, ~nls(mpg~hp^b,
data = .x,
start = list(b = 1))),
clean_mod = map(model, broom::tidy)) |>
unnest(clean_mod) |>
select(-c(data, model))
#> # A tibble: 3 x 6
#> cyl term estimate std.error statistic p.value
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 6 b 0.618 0.0115 53.6 2.83e- 9
#> 2 4 b 0.731 0.0217 33.7 1.27e-11
#> 3 8 b 0.504 0.0119 42.5 2.46e-15
#what I expect will work for your data
All_M3_models <- M3 |>
nest(data = -trtlist) |>
mutate(model = map(data, ~nls(rate ~ (Vmax * conc /(Km + conc)),
data=.x,
start=list(Km=200, Vmax=2))),
clean_mod = map(model, broom::tidy))|>
unnest(clean_mod) |>
select(-c(data, model))
I'm trying to use emmeans to test "contrasts of contrasts" with custom orthogonal contrasts applied to a zero-inflated negative binomial model. The study design has 4 groups (study_group: grp1, grp2, grp3, grp4), each of which is assessed at 3 timepoints (time: Time1, Time2, Time3).
With the code below, I am able to get very close to, but not exactly, what I want. The contrasts that emerge are expressed in terms of ratios such as grp1/grp2, grp1/grp3,..., grp3/grp4 ("lower over higher"; see output following code).
What would be immensely helpful to me to have a way to flip these ratios to be grp2/grp1, grp3/grp1,..., grp4/grp3 ("higher over lower"). I've tried sticking reverse=TRUE in various spots, but to no effect.
Short of re-leveling the study_group factor, is there anyway to do this in emmeans?
Thanks!
library(glmmTMB)
library(emmeans)
set.seed(3456)
# Building grid for study design: 4 groups of 3 sites,
# each with 20 participants observed 3 times
site <- rep(1:12, each=60)
pid <- 1000*site+10*(rep(rep(1:20,each=3),12))
study_group <- c(rep("grp1",180), rep("grp2",180), rep("grp3",180), rep("grp4",180))
grp_num <- c(rep(0,180), rep(1,180), rep(2,180), rep(3,180))
time <- c(rep(c("Time1", "Time2", "Time3"),240))
time_num <- c(rep(c(0:2),240))
# Site-level random effects (intercepts)
site_eff_count = rep(rnorm(12, mean = 0, sd = 0.5), each = 60)
site_eff_zeros = rep(rnorm(12, mean = 0, sd = 0.5), each = 60)
# Simulating a neg binomial outcome
y_count <- rnbinom(n = 720, mu=exp(3.25 + grp_num*0.15 + time_num*-0.20 + grp_num*time_num*0.15 + site_eff_count), size=0.8)
# Simulating some extra zeros
log_odds = (-1.75 + grp_num*0.2 + time_num*-0.40 + grp_num*time_num*0.50 + site_eff_zeros)
prob_1 = plogis(log_odds)
prob_0 = 1 - prob_1
y_zeros <- rbinom(n = 720, size = 1, prob = prob_0)
# Building datasest with ZINB-ish outcome
data_ZINB <- data.frame(site, pid, study_group, time, y_count, y_zeros)
data_ZINB$y_obs <- ifelse(y_zeros==1, y_count, 0)
# Estimating ZINB GLMM in glmmTMB
mod_ZINB <- glmmTMB(y_obs ~ 1
+ study_group + time + study_group*time
+ (1|site),
family=nbinom2,
zi = ~ .,
data=data_ZINB)
#summary(mod_ZINB)
# Getting model-estimated "cell" means for conditional (non-zero) sub-model
# in response (not linear predictor) scale
count_means <- emmeans(mod_ZINB,
pairwise ~ time | study_group,
component="cond",
type="response",
adjust="none")
# count_means
# Defining custom contrast function for orthogonal time contrasts
# contr1 = Time 2 - Time 1
# contr2 = Time 3 - Times 1 and 2
compare_arms.emmc <- function(levels) {
k <- length(levels)
contr1 <- c(-1,1,0)
contr2 <- c(-1,-1,2)
coef <- data.frame()
coef <- as.data.frame(lapply(seq_len(k - 1), function(i) {
if(i==1) contr1 else contr2
}))
names(coef) <- c("T1vT2", "T1T2vT3")
attr(coef, "adjust") = "none"
coef
}
# Estimating pairwise between-group "contrasts of contrasts"
# i.e., testing if time contrasts differ across groups
compare_arms_contrast <- contrast(count_means[[1]],
interaction = c("compare_arms", "pairwise"),
by = NULL)
compare_arms_contrast
applying theemmeans::contrast function as above yields this:
time_compare_arms study_group_pairwise ratio SE df null t.ratio p.value
T1vT2 grp1 / grp2 1.091 0.368 693 1 0.259 0.7957
T1T2vT3 grp1 / grp2 0.623 0.371 693 1 -0.794 0.4276
T1vT2 grp1 / grp3 1.190 0.399 693 1 0.520 0.6034
T1T2vT3 grp1 / grp3 0.384 0.241 693 1 -1.523 0.1283
T1vT2 grp1 / grp4 0.664 0.245 693 1 -1.108 0.2681
.
.
.
T1T2vT3 grp3 / grp4 0.676 0.556 693 1 -0.475 0.6346
Tests are performed on the log scale
The answer, provided by Russ Lenth in the comments and in the emmeans documentation for the contrast function, is to replace pairwise with revpairwise in the contrast function call.
Similarly to here, I am running a series of regressions on subgroups (all combinations of year and group) using tidyr.
year <- rep(2014:2015, length.out = 10000)
group <- sample(c(0,1,2,3,4,5,6), replace=TRUE, size=10000)
value <- sample(10000, replace = T)
female <- sample(c(0,1), replace=TRUE, size=10000)
smoker <- sample(c(0,1), replace=TRUE, size=10000)
dta <- data.frame(year = year, group = group, value = value, female=female, smoker = smoker)
library(dplyr)
library(broom)
library(stargazer)
# create list of dfs
table_list <- dta %>%
group_by(year, group) %>%
group_split()
# apply the model to each df and produce stargazer result
model_list <- lapply(table_list, function(x) probitmfx(smoker ~ female, data = x))
stargazer(model_list, type = "text")
I get an error saying
% Error: Unrecognized object type.
Anybody know how I can get around this issue?
As Colin noted in the comments, this type of model does not appear to be supported by stargazer. However, it is supported out-of-the-box by the modelsummary package (disclaimer: I am the author).
You can omit the output argument altogether to get a nice HTML table, or change it to save your table to LaTeX, Word, or lots of other formats.
# Code from the original question
library(mfx)
library(dplyr)
year <- rep(2014:2015, length.out = 10000)
group <- sample(c(0,1,2,3,4,5,6), replace=TRUE, size=10000)
value <- sample(10000, replace = T)
female <- sample(c(0,1), replace=TRUE, size=10000)
smoker <- sample(c(0,1), replace=TRUE, size=10000)
dta <- data.frame(year = year, group = group, value = value, female=female, smoker = smoker)
table_list <- dta %>%
group_by(year, group) %>%
group_split()
model_list <- lapply(table_list, function(x) probitmfx(smoker ~ female, data = x))
# New code
library(modelsummary)
modelsummary(model_list, output = "markdown")
Model 1
Model 2
Model 3
Model 4
Model 5
Model 6
Model 7
Model 8
Model 9
Model 10
Model 11
Model 12
Model 13
Model 14
female
0.022
-0.026
-0.033
0.013
-0.030
-0.001
-0.003
-0.073
0.006
-0.041
0.075
-0.006
-0.009
0.023
(0.039)
(0.038)
(0.036)
(0.037)
(0.037)
(0.038)
(0.037)
(0.038)
(0.036)
(0.038)
(0.037)
(0.038)
(0.037)
(0.038)
Num.Obs.
670
688
761
727
740
675
739
688
751
703
733
710
737
678
AIC
932.4
957.3
1058.1
1009.6
1029.2
939.7
1027.6
953.8
1044.9
977.4
1016.0
988.2
1025.6
943.5
BIC
941.4
966.4
1067.4
1018.8
1038.4
948.7
1036.8
962.9
1054.1
986.5
1025.2
997.4
1034.8
952.5
Log.Lik.
-464.206
-476.648
-527.074
-502.806
-512.588
-467.837
-511.809
-474.924
-520.425
-486.682
-506.004
-492.123
-510.810
-469.740
I tried to perform independent t-test for many columns of a dataframe. For example, i created a data frame
set seed(333)
a <- rnorm(20, 10, 1)
b <- rnorm(20, 15, 2)
c <- rnorm(20, 20, 3)
grp <- rep(c('m', 'y'),10)
test_data <- data.frame(a, b, c, grp)
To run the test, i used with(df, t.test(y ~ group))
with(test_data, t.test(a ~ grp))
with(test_data, t.test(b ~ grp))
with(test_data, t.test(c ~ grp))
I would like to have the outputs like this
mean in group m mean in group y p-value
9.747412 9.878820 0.6944
15.12936 16.49533 0.07798
20.39531 20.20168 0.9027
I wonder how can I achieve the results using
1. for loop
2. apply()
3. perhaps dplyr
This link R: t-test over all columns is related but it was 6 years old. Perhaps there are better ways to do the same thing.
Use select_if to select only numeric columns then use purrr:map_df to apply t.test against grp. Finally use broom:tidy to get the results in tidy format
library(tidyverse)
res <- test_data %>%
select_if(is.numeric) %>%
map_df(~ broom::tidy(t.test(. ~ grp)), .id = 'var')
res
#> # A tibble: 3 x 11
#> var estimate estimate1 estimate2 statistic p.value parameter conf.low
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 a -0.259 9.78 10.0 -0.587 0.565 16.2 -1.19
#> 2 b 0.154 15.0 14.8 0.169 0.868 15.4 -1.78
#> 3 c -0.359 20.4 20.7 -0.287 0.778 16.5 -3.00
#> # ... with 3 more variables: conf.high <dbl>, method <chr>,
#> # alternative <chr>
Created on 2019-03-15 by the reprex package (v0.2.1.9000)
Simply extract the estimate and p-value results from t.test call while iterating through all needed columns with sapply. Build formulas from a character vector and transpose with t() for output:
formulas <- paste(names(test_data)[1:(ncol(test_data)-1)], "~ grp")
output <- t(sapply(formulas, function(f) {
res <- t.test(as.formula(f))
c(res$estimate, p.value=res$p.value)
}))
Input data (seeded for reproducibility)
set.seed(333)
a <- rnorm(20, 10, 1)
b <- rnorm(20, 15, 2)
c <- rnorm(20, 20, 3)
grp <- rep(c('m', 'y'),10)
test_data <- data.frame(a, b, c, grp)
Output result
# mean in group m mean in group y p.value
# a ~ grp 9.775477 10.03419 0.5654353
# b ~ grp 14.972888 14.81895 0.8678149
# c ~ grp 20.383679 20.74238 0.7776188
As you asked for a for loop:
a <- rnorm(20, 10, 1)
b <- rnorm(20, 15, 2)
c <- rnorm(20, 20, 3)
grp <- rep(c('m', 'y'),10)
test_data <- data.frame(a, b, c, grp)
meanM=NULL
meanY=NULL
p.value=NULL
for (i in 1:(ncol(test_data)-1)){
meanM=as.data.frame(rbind(meanM, t.test(test_data[,i] ~ grp)$estimate[1]))
meanY=as.data.frame(rbind(meanY, t.test(test_data[,i] ~ grp)$estimate[2]))
p.value=as.data.frame(rbind(p.value, t.test(test_data[,i] ~ grp)$p.value))
}
cbind(meanM, meanY, p.value)
It works, but I am a beginner in R. So maybe there is a more efficient solution
Using lapply this is rather easy.
I have tested the code with set.seed(7060) before creating the dataset, in order to make the results reproducible.
tests_list <- lapply(letters[1:3], function(x) t.test(as.formula(paste0(x, "~ grp")), data = test_data))
result <- do.call(rbind, lapply(tests_list, `[[`, "estimate"))
pval <- sapply(tests_list, `[[`, "p.value")
result <- cbind(result, p.value = pval)
result
# mean in group m mean in group y p.value
#[1,] 9.909818 9.658813 0.6167742
#[2,] 14.578926 14.168816 0.6462151
#[3,] 20.682587 19.299133 0.2735725
Note that a real life application would use names(test_data)[1:3], not letters[1:3], in the first lapply instruction.
This should be a comment rather than an answer, but I'll make it an answer. The reason is that the accepted answer is awesome but with one caveat that may cost others hours, which is at least the case for me.
The original data posted by OP
a <- rnorm(20, 10, 1)
b <- rnorm(20, 15, 2)
c <- rnorm(20, 20, 3)
grp <- rep(c('m', 'y'),10)
test_data <- data.frame(a, b, c, grp)
The answer provided by #Tung
library(tidyverse)
res <- test_data %>%
select_if(is.numeric) %>%
map_df(~ broom::tidy(t.test(. ~ grp)), .id = 'var')
res
The problem, or more accurately, the caveat, of this answer is that one has to define the grp variable separately. Having the group variable outside of the dataframe is not a common practice as far as I know. So, even the answer is neat, it may be better to point out this operation (define group variable outside of the dataframe). Therefore, I use this comment like answer in the hope to save some time for those late comers.
I have a following model:
coxph(Surv(fulength, mortality == 1) ~ pspline(predictor))
where is fulength is a duration of follow-up (including mortality), predictor is a predictor of mortality.
The output of the command above is this:
coef se(coef) se2 Chisq DF p
pspline(predictor), line 0.174 0.0563 0.0562 9.52 1.00 0.002
pspline(predictor), nonl 4.74 3.09 0.200
How can I plot this model so that I get the nice curvy line with 95% confidence bands and hazard ratio on the y axis? What I am aiming for is something similar to this:
This is when you get when you run the first example in ?cph of the rms-package:
n <- 1000
set.seed(731)
age <- 50 + 12*rnorm(n)
label(age) <- "Age"
sex <- factor(sample(c('Male','Female'), n,
rep=TRUE, prob=c(.6, .4)))
cens <- 15*runif(n)
h <- .02*exp(.04*(age-50)+.8*(sex=='Female'))
dt <- -log(runif(n))/h
label(dt) <- 'Follow-up Time'
e <- ifelse(dt <= cens,1,0)
dt <- pmin(dt, cens)
units(dt) <- "Year"
dd <- datadist(age, sex)
options(datadist='dd')
S <- Surv(dt,e)
f <- cph(S ~ rcs(age,4) + sex, x=TRUE, y=TRUE)
cox.zph(f, "rank") # tests of PH
anova(f)
plot(Predict(f, age, sex)) # plot age effect, 2 curves for 2 sexes
Because the rms/Hmisc package combo uses lattice plots, annotation with a marginal age-density feature would need to be done with lattice-functions. On the other hand, if you want to change the response value to relative hazard you can just add a 'fun=exp' argument to the Predict call and relable the graph to get:
png(); plot(Predict(f, age, sex, fun=exp), ylab="Relative Hazard");dev.off()