I am trying to run a t-test on multiple columns. Basically trying to find the change from baseline to year 1 for a number of joint angles. I only want to conduct this on the study side. Below is an image with the first few rows and columns of the data. Sample Data
I have tried using both of these functions without success:
Code 1:
res <- FAI_SLS %>%
filter(study_side == "Study")%>%
select(-id,-subject,-activity,-side,-study_side,-year) %>%
map_df(~ broom::tidy(t.test(. ~ year)), .id = 'var')
I get the following error:
Error in eval(predvars, data, env) : object 'year' not found
I tried taking out -year but I still have the same issue.
Code 2:
t(sapply(FAI_SLS%>%filter(study_side == "Study")%>%select(-id,-subject,-activity,-side,-study_side,-year), function(x)
unlist(t.test(x~FAI_SLS$year)[c("estimate","p.value","statistic","conf.int")])))
I get the following error:
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 't': variable lengths differ (found for 'FAI_SLS$year')
Again I tried taking -year out without success.
Any suggestions on how I can fix this? Thanks
Try fitting the t-test within summarise() on all the columns you want to test (selected in across()). Here's an example with a different dataset:
library(dplyr)
library(tidyr)
data("storms")
storms %>%
filter(year %in% c(2019, 2020)) %>%
summarise(across(-c(name, year, status, category),
~broom::tidy(t.test(. ~ year)))) %>%
pivot_longer(everything(), names_to = "variable") %>%
unnest(value)
#> # A tibble: 9 × 11
#> variable estimate estimate1 estimate2 statistic p.value parameter conf.low
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 month 0.0917 8.93 8.84 1.15 2.52e- 1 892. -0.0654
#> 2 day 4.29 18.2 13.9 7.49 2.34e-13 641. 3.17
#> 3 hour -0.0596 9.13 9.19 -0.128 8.99e- 1 687. -0.978
#> 4 lat 2.14 25.9 23.7 3.75 1.94e- 4 668. 1.02
#> 5 long 6.06 -60.7 -66.8 4.27 2.25e- 5 736. 3.27
#> 6 wind 8.42 58.8 50.4 4.42 1.18e- 5 529. 4.68
#> 7 pressure -4.46 989. 993. -3.03 2.59e- 3 537. -7.35
#> 8 tropicalst… 7.39 153. 145. 0.810 4.18e- 1 701. -10.5
#> 9 hurricane_… 10.9 24.1 13.2 3.92 1.02e- 4 508. 5.45
#> # … with 3 more variables: conf.high <dbl>, method <chr>, alternative <chr>
Created on 2022-06-02 by the reprex package (v2.0.1)
Related
*I want to group nested (multiply imputed) dataset and then apply linear regression on each dataset. I have tried a number of approaches, including the map options (2) and the for loop (3). I have had no luck at all. I want the model results to look like results from summary(mod1). Does anyone know what I could be doing wrong?
# get dependencies
library(mice)
library(tidyverse)
# impute the boys dataset from mice package
boys_imp <- mice(boys)
# 1) I want to run a model like this on my multiply imputed dataset
mod <- boys %>%
group_by(reg) %>%
do(tidy(
lm(
data=.,
formula = wgt ~ bmi),
conf.int = T))
summary(mod1)
# A tibble: 12 × 8
# Groups: reg [6]
reg term estimate std.error statistic p.value conf.low conf.high
<fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 north (Intercept) -81.9 9.84 -8.32 2.48e-12 -101. -62.3
2 north bmi 6.84 0.500 13.7 2.53e-22 5.85 7.84
3 east (Intercept) -75.3 7.62 -9.89 3.21e-18 -90.4 -60.3
4 east bmi 6.29 0.420 15.0 4.53e-32 5.46 7.12
5 west (Intercept) -91.9 6.31 -14.6 2.48e-34 -104. -79.4
6 west bmi 7.17 0.347 20.7 3.49e-54 6.49 7.86
7 south (Intercept) -79.8 6.73 -11.9 1.83e-24 -93.1 -66.5
8 south bmi 6.47 0.373 17.3 1.63e-40 5.73 7.20
9 city (Intercept) -92.0 13.9 -6.61 6.75e- 9 -120. -64.2
10 city bmi 6.95 0.757 9.18 1.39e-13 5.44 8.46
11 NA (Intercept) -88.6 43.8 -2.02 2.92e- 1 -645. 468.
12 NA bmi 6.46 2.89 2.24 2.68e- 1 -30.2 43.1
# 2) the map way --------------------------------------------------------
mod_imp <- boys_imp %>%
mice::complete("all") %>%
map(group_by, reg) %>%
map(lm, formula = wgt ~ bmi) %>%
pool()
summary(mod_imp)
term estimate std.error statistic df p.value
1 (Intercept) -85.473428 3.5511961 -24.06891 715.1703 0
2 bmi 6.793622 0.1945322 34.92287 693.7835 0
# 3) for loop way-------------------------------------------------------
# nest the mids dataset
boys_imp2 <- boys_imp %>%
mice::complete("all")
dat1 <- replicate(length(boys_imp2), NULL) # preallocate same size
# run the for loop
for (i in seq_along(boys_imp2)) {
dat1[[i]] <- boys_imp2[[i]] %>%
group_by(reg) %>%
do(lm(wgt ~ bmi, data = boys_imp2[[i]]))
}
|==================================================================|100% ~0 s remaining Error in `do()`:
! Results 1, 2, 3, 4, 5, ... must be data frames, not lm.
Run `rlang::last_error()` to see where the error occurred.*
I have found a solution to the problem. This involve grouping the data by ID and variable of interest, subsequently I map lm on to the datasets. I then finish off with unnesting the data
boys_imp %>%
mice::complete("long", include = FALSE) %>%
group_by(.imp, reg) %>%
nest() %>%
mutate(lm_model = map(data, ~lm(bmi ~ phb, data = .))) %>%
group_by(reg) %>%
summarise(model = list(tidy(pool(lm_model),conf.int = T))) %>%
unnest_wider(model) %>%
unnest(cols = c(term, estimate, std.error,
statistic, p.value, conf.low, conf.high))
# A tibble: 30 × 16
reg term estimate std.error statistic p.value conf.low conf.high b df dfcom fmi lambda m riv ubar
<fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 north (Intercept) 19.3 0.332 57.9 0 18.6 19.9
2 north phb.L 5.10 0.678 7.53 1.81e-10 3.75 6.46
3 north phb.Q 1.25 0.800 1.56 1.24e- 1 -0.357 2.86
4 north phb.C -0.430 0.882 -0.487 6.30e- 1 -2.25 1.39
5 north phb^4 -1.10 0.948 -1.16 2.57e- 1 -3.07 0.862
6 north phb^5 -0.156 1.08 -0.144 8.87e- 1 -2.41 2.10
7 east (Intercept) 18.7 0.244 76.8 0 18.3 19.2
8 east phb.L 4.83 0.509 9.48 4.44e-15 3.82 5.84
9 east phb.Q 1.10 0.692 1.60 1.27e- 1 -0.343 2.55
10 east phb.C -0.518 0.671 -0.772 4.49e- 1 -1.91 0.878
# … with 20 more rows
# ℹ Use `print(n = ...)` to see more rows
I have three columns, one per group, with numeric values. I want to analyze them using an Anova test, but I found applications when you have the different groups in a column and the respective values in the second column. I wonder if it is necessary to reorder the data like that, or if there is a method that I can use for the columns that I currently have. Here I attached a capture:
Thanks!
You can convert a wide table having many columns into another table having only two columns for key (group) and value (response) by pivoting the data:
library(tidyverse)
# create example data
set.seed(1337)
data <- tibble(
VIH = runif(100),
VIH2 = runif(100),
VIH3 = runif(100)
)
data
#> # A tibble: 100 × 3
#> VIH VIH2 VIH3
#> <dbl> <dbl> <dbl>
#> 1 0.576 0.485 0.583
#> 2 0.565 0.495 0.108
#> 3 0.0740 0.868 0.350
#> 4 0.454 0.833 0.324
#> 5 0.373 0.242 0.915
#> 6 0.331 0.0694 0.0790
#> 7 0.948 0.130 0.563
#> 8 0.281 0.122 0.287
#> 9 0.245 0.270 0.419
#> 10 0.146 0.488 0.838
#> # … with 90 more rows
data %>%
pivot_longer(everything()) %>%
aov(value ~ name, data = .)
#> Call:
#> aov(formula = value ~ name, data = .)
#>
#> Terms:
#> name Residuals
#> Sum of Squares 0.124558 25.171730
#> Deg. of Freedom 2 297
#>
#> Residual standard error: 0.2911242
#> Estimated effects may be unbalanced
Created on 2022-05-10 by the reprex package (v2.0.0)
I would like to find a better way to bind together the results of any number of regressions after adding an identifier for each model. The code below is my current solution but is too manual for a large number of regressions. This is part of a larger tidy workflow so a solution inside of the tidyverse is preferred but whatever works is fine. Thanks
library(tidyverse)
library(broom)
model_dat=mtcars %>%
do(lm_1 = tidy(lm(disp~ wt*vs, data = .),conf.int=T),
lm_2=tidy(lm(cyl ~ wt*vs, data = .),conf.int=T ),
lm_3=tidy(lm(mpg ~ wt*vs, data = .),conf.int=T ))
df=model_dat %>%
select(lm_1) %>%
unnest(c(lm_1)) %>%
mutate(model="one") %>%
select(model,term,estimate,p.value:conf.high) %>%
bind_rows(
model_dat %>%
select(lm_2) %>%
unnest(c(lm_2)) %>%
mutate(model="two") %>%
select(model,term,estimate,p.value:conf.high)) %>%
bind_rows(
model_dat %>%
select(lm_3) %>%
unnest(c(lm_3)) %>%
mutate(model="three") %>%
select(model,term,estimate,p.value:conf.high))
It may be easier with map2 i.e. loop across the columns and the corresponding english word for the sequence of columns, pluck the list element, create the 'model' column with second argument i.e. engish words (.y), select the columns of interest, and create a single dataset by specifying _dfr in map
library(purrr)
library(english)
library(dplyr)
library(broom)
map2_dfr(model_dat, as.character(english(seq_along(model_dat))),
~ .x %>%
pluck(1) %>%
mutate(model = .y) %>%
select(model, term, estimate, p.value:conf.high) )
-output
# A tibble: 12 x 6
# model term estimate p.value conf.low conf.high
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 one (Intercept) -70.0 1.55e- 1 -168. 28.2
# 2 one wt 102. 8.20e- 9 76.4 128.
# 3 one vs 31.2 6.54e- 1 -110. 172.
# 4 one wt:vs -36.7 1.10e- 1 -82.2 8.82
# 5 two (Intercept) 4.31 1.28e- 5 2.64 5.99
# 6 two wt 0.849 4.90e- 4 0.408 1.29
# 7 two vs -2.19 7.28e- 2 -4.59 0.216
# 8 two wt:vs 0.0869 8.20e- 1 -0.689 0.862
# 9 three (Intercept) 29.5 6.55e-12 24.2 34.9
#10 three wt -3.50 2.33e- 5 -4.92 -2.08
#11 three vs 11.8 4.10e- 3 4.06 19.5
#12 three wt:vs -2.91 2.36e- 2 -5.40 -0.419
Or use summarise with across, unclass and then bind with bind_rows
model_dat %>%
summarise(across(everything(), ~ {
# // get the column name
nm1 <- cur_column()
# // extract the list element (.[[1]])
list(.[[1]] %>%
# // create new column by extracting the numeric part
mutate(model = english(readr::parse_number(nm1))) %>%
# // select the subset of columns, wrap in a list
select(model, term, estimate, p.value:conf.high))
}
)) %>%
# // unclass to list
unclass %>%
# // bind the list elements
bind_rows
-output
# A tibble: 12 x 6
# model term estimate p.value conf.low conf.high
# <english> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 one (Intercept) -70.0 1.55e- 1 -168. 28.2
# 2 one wt 102. 8.20e- 9 76.4 128.
# 3 one vs 31.2 6.54e- 1 -110. 172.
# 4 one wt:vs -36.7 1.10e- 1 -82.2 8.82
# 5 two (Intercept) 4.31 1.28e- 5 2.64 5.99
# 6 two wt 0.849 4.90e- 4 0.408 1.29
# 7 two vs -2.19 7.28e- 2 -4.59 0.216
# 8 two wt:vs 0.0869 8.20e- 1 -0.689 0.862
# 9 three (Intercept) 29.5 6.55e-12 24.2 34.9
#10 three wt -3.50 2.33e- 5 -4.92 -2.08
#11 three vs 11.8 4.10e- 3 4.06 19.5
#12 three wt:vs -2.91 2.36e- 2 -5.40 -0.419
I have a dataframe with values for multiple macro variables. When i compute log of the values and then the log differences it changes the variables into lists, causing problems with my script later on.
Example code:
#Compute log of relevant macrovariables
macro[,c("hp", "unem", "m1", "inc")] <- log(macro[,c("hp", "unem", "m1", "inc")])
colnames(macro)[2:5] <- paste(colnames(macro)[2:5], "log", sep = "_")
#Computing log differences
macro$ldiff_hp <- c(-diff(macro$hp_log), na.omit)
Im trying to unlist the columns and convert them to numeric with either of the following:
#Alternative 1
macro[,15:19]<- unlist(as.numeric(macro[,15:19]))
#Alternative 2
macro[,15:19] <- sapply(macro[,15:19],as.numeric)
It gives me the following error output:
> macro[,15:19]<- unlist(as.numeric(macro[,15:19]))
Error in unlist(as.numeric(macro[, 15:19])) :
(list) object cannot be coerced to type 'double'
Using the economics dataset from ggplot2 as example data and making use of dplyrs lag function the log differenced vars can be computed like so:
library(ggplot2)
library(dplyr)
macro <- ggplot2::economics
vars <- c("uempmed", "psavert")
vars_log <- paste(vars, "log", sep = "_")
vars_ldiff <- paste(vars, "ldiff", sep = "_")
#Compute log of relevant macrovariables
macro[, vars_log] <- sapply(macro[, vars], log)
# Lag values
macro[, vars_ldiff] <- sapply(macro[, vars_log], dplyr::lag)
# First Difference of logs
macro[, vars_ldiff] <- macro[, vars_log] - macro[, vars_ldiff]
macro
#> # A tibble: 574 x 10
#> date pce pop psavert uempmed unemploy uempmed_log psavert_log
#> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1967-07-01 507. 198712 12.6 4.5 2944 1.50 2.53
#> 2 1967-08-01 510. 198911 12.6 4.7 2945 1.55 2.53
#> 3 1967-09-01 516. 199113 11.9 4.6 2958 1.53 2.48
#> 4 1967-10-01 512. 199311 12.9 4.9 3143 1.59 2.56
#> 5 1967-11-01 517. 199498 12.8 4.7 3066 1.55 2.55
#> 6 1967-12-01 525. 199657 11.8 4.8 3018 1.57 2.47
#> 7 1968-01-01 531. 199808 11.7 5.1 2878 1.63 2.46
#> 8 1968-02-01 534. 199920 12.3 4.5 3001 1.50 2.51
#> 9 1968-03-01 544. 200056 11.7 4.1 2877 1.41 2.46
#> 10 1968-04-01 544 200208 12.3 4.6 2709 1.53 2.51
#> # ... with 564 more rows, and 2 more variables: uempmed_ldiff <dbl>,
#> # psavert_ldiff <dbl>
Created on 2020-03-23 by the reprex package (v0.3.0)
I am trying to run a paired t-test on pre- and post-intervention results of three intervention types. I am trying to run the the test on each intervention separately using "subset" in t.test function but it keeps running the test on the whole sample. I cannot separate the intervention levels manually as this is a large database and I do not have access to the excel file. Does anyone have any suggestions?
Here's the codes I am using:
Treatment (intervention) levels:"Passive" "Pro" "Peer"
"Post" and "Pre" are continuous variables.
t.test(data$Post, data$Pre, paired=T, subset=data$Treatment=="Peer")
t.test(data$Post, data$Pre, paired=T, subset=data$Treatment=="Pro")
t.test(data$Post, data$Pre, paired=T, subset=data$Treatment=="Passive")
There is no subset argument (nor a data argument) for the t.test function when using the default method:
> args(stats:::t.test.default)
function (x, y = NULL, alternative = c("two.sided", "less",
"greater"), mu = 0, paired = FALSE, var.equal = FALSE,
conf.level = 0.95, ...)
You'll have to subset first,
with(subset(data, subset=Treatment=="Peer"),
t.test(Post, Pre, paired=TRUE)
)
There's also an easier way using dplyr and broom...
library(dplyr)
library(broom)
data %>%
group_by(Treatment) %>%
do(tidy(t.test(.$Pre, .$Post, paired=TRUE)))
Reproducible example:
set.seed(123)
data <- tibble(id=1:63, Pre=rnorm(21*3,10,5), Post=rnorm(21*3,13,5),
Treatment=sample(c("Peer","Pro","Passive"), 63, TRUE))
data
# A tibble: 63 x 4
id Pre Post Treatment
<int> <dbl> <dbl> <chr>
1 1 7.20 7.91 Pro
2 2 8.85 7.64 Peer
3 3 17.8 14.5 Peer
4 4 10.4 15.2 Peer
5 5 10.6 13.3 Passive
6 6 18.6 17.6 Passive
7 7 12.3 23.3 Pro
8 8 3.67 10.5 Peer
9 9 6.57 1.45 Pro
10 10 7.77 18.0 Passive
# ... with 53 more rows
Output:
# A tibble: 3 x 9
# Groups: Treatment [3]
Treatment estimate statistic p.value parameter conf.low conf.high method alternative
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 Passive -2.41 -1.72 0.107 14 -5.42 0.592 Paired t-~ two.sided
2 Peer -3.61 -2.96 0.00636 27 -6.11 -1.10 Paired t-~ two.sided
3 Pro -1.22 -0.907 0.376 19 -4.03 1.59 Paired t-~ two.sided