Paired t test with multiple time points - r

I have a dataset with 6-time points and I hope to do multiple paired sample t-tests to compare the scores.
The data has been transformed into the long one
I want to achieve something like this table:
I tried to use the following code, but it does not work.
stat.test <- anxiety_score %>%
group_by(group) %>%
pairwise_t_test(
anxiety_score ~ time, paired = TRUE,
p.adjust.method = "bonferroni"
) %>%
select(-df, -statistic, -p) # Remove details
stat.test

Related

gt_summary- add_p(): why different test are performed even if only one is requested?

I am using gtsummary to produce a table and produce statistic using paired wilcoxon. I stratified table columns for two variables: BCLC_recoded (three modalities) and Timing (two modalities) and I want to compare a continuos variable (Cc_Fil) among the different classes. However p value for paired wilcoxon is obtained only for some and not all the variables.
df %>%
select(Cc_Fil,Timing,Sample,BCLC_recoded) %>%
tbl_strata(
strata = BCLC_recoded,
.tbl_fun =
~ .x %>%
tbl_summary(
by = Timing,
include=-Sample,
digits=all_continuous()~2,
label=c(Cc_Fil~"cfDNA concentration (ng/μL)"),
missing = "no") %>%
add_p(test=list(Cc_Fil~"paired.wilcox.test"),
group=Couple_of_plasma,
pvalue_fun = ~ style_pvalue(.x, digits = 3)) %>%
bold_p(t=0.05, q=FALSE) %>%
add_n(),
.header = "**{strata}**, N = {n}"
)
Hower when running the above code I obtained the p-value for paired Wilcoxon only for the first two categories whereas for the third one I obtained an unpaired Wilcoxon.
I have tried also with other variables and the same error occurs with all the variable with more than two modalities.

R two different code chunks to get a p-value but the code evaluates differently and I can't figure out the difference

I'm trying to figure out why these two code chunks give me different p-values for Welch's T-Test. I really just tried to do a tidy version of the base R code and create a table with both statistics. But the tidy version I'm using has a very small p-value and I'm confused as to why.
t.test(mpg ~ vs, data = mtcars) # p-value = 0.0001098
t.test(mpg ~ am, data = mtcars) # p-value = 0.001374
options(scipen = 999)
mtcars %>%
dplyr::select(mpg, vs, am) %>%
pivot_longer(names_to = 'names', values_to = 'values', 2:3) %>%
nest(data = -names) %>%
mutate(
test = map(data, ~ t.test(.x$mpg, .x$values)), # S3 list-col
tidied = map(test, tidy)
) %>%
unnest(tidied) # vs = 0.000000000000000010038009 and am = 0.000000000000000009611758
If you run simply:
t.test(mtcars$mpg, mtcars$vs)
You'll get the same values as in your nested data example.
So the issue is not the nesting - it's that you're performing a different kind of t-test. The formula version is treating the variables vs or am as having two groups (0, 1) and the vectorized version is not.

What does se.fit represent exactly? How can we compute it manually?

Given the following code in R why should we have an array of standard errors instead of one standard error?
library(dplyr)
library(HistData)
data("GaltonFamilies")
set.seed(1983)
galton_heights <- GaltonFamilies %>%
filter(gender == "male") %>%
group_by(family) %>%
sample_n(1) %>%
ungroup() %>%
select(father, childHeight) %>%
rename(son = childHeight)
Now consider this:
fit <- galton_heights %>% lm(son ~ father, data = .)
Y_hat <- predict(fit, se.fit = TRUE)
The predicted values could be extracted by this code:
Y_hat$fit
which gives us the predicted height of Sons for the fathers in galton_heights.
However, I do not understand why do we have an array of standard errors when we run this code:
Y_hat$se.fit
What do these values refer to? and how could they be calculated manually?

Object '.' not found while piping with dplyr

I am trying to conduct a survival curve using the survival package. The MWE code is as follows:
df %>%
filter(fac <= "Limit") %>%
survfit(Surv(tte, !is.na(event)) ~ fac, data = .) %>%
ggsurvplot(fit = .)
I get the error Error in eval(fit$call$data) : object '.' not found
When I try to break this down further by:
survfit <- df %>%
filter(fac <= "Limit") %>%
survfit(Surv(tte, !is.na(event)) ~ fac, data = .)
ggsurvplot(fit = survfit)
I get an identical error. Is anyone able to figure out how to pipe from my dataframe all the way through a survival curve? The reason I would like to do this is to streamline the filtering of my dataframe in order to produce a multitude of different survival curves without having to create many subsetted dataframes.
Apparently, ggsurvplot expects an object of class "survfit" as its first argument but also needs the data set as an argument.
The example below is based on the first example of function
survfit.formula {survival}.
library(dplyr)
library(survival)
library(survminer)
aml %>%
survfit(Surv(time, status) ~ x, data = .) %>%
ggsurvplot(data = aml)
In the question's case this would become
df %>%
filter(fac <= "Limit") %>%
survfit(Surv(tte, !is.na(event)) ~ fac, data = .) %>%
ggsurvplot(data = filter(df, fac <= "Limit"))

run multiple model and save model comparison results in dataframe in r

I want to run lm models and save model comparison result and extract p-values. I would like to save all the info in a dataframe.
Using diamonds dataset as an example:
diamonds %>%
group_by(cut) %>%
do(model1 = lm(price~carat, data=.),
model2 = lm(price~carat+depth, data=.)) %>%
mutate(anova = anova(model2,model1)) %>%
mutate(pval= anova$'Pr(>F'[2])
I got error message below:
Error in mutate_impl(.data, dots) :
Column `anova` must be length 1 (the group size), not 6
My question is:
Why I got the error message and how to save anova result in the dataframe?
how to make the whole process work if lm or anova do not work on some subsets? something like try..catch..
My real data is more complicated then this. Just use diamonds and linear model to illustrate the idea.
Thanks a lot.
This is a really good application of the tidyr::nest() function in conjunction with purrr and broom. What you do is:
- Group the data frame
- Apply a model with mutate(mod = map(data, model)
- summarize the model using broom::tidy()
- extract the relevant statistics.
For more on this here's a great talk by Hadley on the subject: https://www.youtube.com/watch?v=rz3_FDVt9eg
In your case I think you can do something like this:
library(tidyverse)
library(broom)
diamonds %>%
group_by(cut) %>%
nest() %>%
mutate(
model1 = map(data, ~lm(price~carat, data=.)),
model2 = map(data, ~lm(price~carat+depth, data=.))
) %>%
mutate(anova = map2(model1, model2, ~anova(.x,.y))) %>%
mutate(tidy_anova = map(anova, broom::tidy)) %>%
mutate(p_val = map_dbl(tidy_anova, ~.$p.value[2])) %>%
select(p_val)

Resources