How to transform the muliple comparison results(*) into letter marking - r

I am conducting kruskal-wallis rank tests with the package rstatix and agricolae. I have doubt on how to transform the results of different packages.
Here are my sample codes:
data("iris")
#### agricolae
library(agricolae)
print( kruskal(iris[, 1], factor(iris[, 5]), group=TRUE, p.adj="bonferroni") )
##
> $groups
iris[, 1] groups
virginica 114.21 a
versicolor 82.65 b
setosa 29.64 c
library(rstatix)
iris %>% dunn_test(Sepal.Length ~ Species)
> # A tibble: 3 x 9
.y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
* <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
1 Sepal.~ setosa versi~ 50 50 6.11 1.02e- 9 2.04e- 9 ****
2 Sepal.~ setosa virgi~ 50 50 9.74 2.00e-22 6.00e-22 ****
3 Sepal.~ versic~ virgi~ 50 50 3.64 2.77e- 4 2.77e- 4 ***
I want to transform the '**' to letters 'abc', and are there any better packages or ways to do the multivariate nonparametric test, pls show the codes, thanks!

Related

Use `dplyr` to run all pairwise comparisons and add Cohen's *d*

I am using rstatix to perform multiple t-tests on a dataset which works very well, but I also need Cohen's d. rstatix also includes a function to calculate Cohen's d, but it requires the original dataset, not the table generated by the t_test function.
library(tidyverse)
library(rstatix)
iris %>%
t_test(Petal.Width ~ Species, paired = FALSE, var.equal = FALSE)
Which gives me:
# A tibble: 3 × 10
.y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif
* <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr>
1 Petal.Width setosa versicolor 50 50 -34.1 74.8 2.72e-47 5.44e-47 ****
2 Petal.Width setosa virginica 50 50 -42.8 63.1 2.44e-48 7.32e-48 ****
3 Petal.Width versicolor virginica 50 50 -14.6 89.0 2.11e-25 2.11e-25 ****
This output would be perfect if it had an additional column with Cohen's d for each t-test. How do I get Cohen's d?
We may do a join
library(dplyr)
library(rstatix)
iris %>%
cohens_d(Petal.Width ~ Species, paired = TRUE) %>%
inner_join(iris %>%
t_test(Petal.Width ~ Species, paired = FALSE, var.equal = FALSE)
)

R -how to apply Pairwise_t_test by subgroups across multiple columns

I'm attempting to determine if there's a significant difference between any of 40 measured variables in a dichotomous classification within 4 different subgroups.
The data are such that a Y/N factor column contains 'class', a 'subgroup' factor column has "A,B,C,D" and then 40 columns with numbers.
So far I can do the t_test for each variable using purrr::map.
ttest_list<- purrr::map(names(Project_Data)[3:40], ~pairwise_t_test(reformulate('class', response = .x), data = Project_Data))
I get a list with 40 tibbles like below:
[[1]]
# A tibble: 1 x 9
.y. group1 group2 n1 n2 p p.signif p.adj p.adj.signif
* <chr> <chr> <chr> <int> <int> <dbl> <chr> <dbl> <chr>
1 valine_NMR Y N 220 382 0.00155 ** 0.00155 **
Going one at a time I can use the group_by and get:
pwc_valine <- Project_Data %>%
group_by(subgroup) %>%
pairwise_t_test(valine_NMR ~ class, p.adjust.method = "bonferroni")
pwc_valine
subgroup .y. group1 group2 n1 n2 p p.signif p.adj p.adj.signif
* <fct> <chr> <chr> <chr> <int> <int> <dbl> <chr> <dbl> <chr>
1 A valine_NMR Y N 17 28 0.00619 ** 0.00619 **
2 B valine_NMR Y N 105 111 0.346 ns 0.346 ns
3 C valine_NMR Y N 86 126 0.000124 *** 0.000124 ***
4 D valine_NMR Y N 12 117 0.772 ns 0.772 ns
How do I apply the pairwise_t_test across all the columns while keeping subgroups?

How do I reiterate through multiple variables

I have a sample dataset as below:
Day<-c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2)
Group<-c("A","A","A","B","B","B","C","C","C","A","A","A","A","B","B","B","C","C","C")
Rain<-c(4,4,6,5,3,4,5,5,3,6,6,6,5,3,3,3,2,5,2)
UV<-c(6,6,7,8,5,6,5,6,6,6,7,7,8,8,5,6,8,5,7)
dat<-data.frame(Day,Group,Rain,UV)
I want to run a Kruskal Wallis test among 'A','B' and 'C' in "Group" for the variables "Rain" and "UV".
At present, I am subsetting the variables one by one for Kruskal test as below:
dat_Rain<-dat%>%select(c(Day,Group,Rain))
library(rstatix)
library(tidyverse)
dat_Rain%>%
group_by(Day) %>%
kruskal_test(Rain ~ Group)
How do I reiterate Kruskal test for multiple variables (Rain,UV) in this dataset? Thanks.
You can define the columns that you want to apply kruskal_test and use map_df to get all the values in one dataframe.
library(rstatix)
library(tidyverse)
cols <- c('Rain', 'UV')
map_df(cols, ~dat %>% group_by(Day) %>% kruskal_test(reformulate('Group', .x)))
# Day .y. n statistic df p method
# <dbl> <chr> <int> <dbl> <int> <dbl> <chr>
#1 1 Rain 9 0.505 2 0.777 Kruskal-Wallis
#2 2 Rain 10 6.52 2 0.0384 Kruskal-Wallis
#3 1 UV 9 1.16 2 0.56 Kruskal-Wallis
#4 2 UV 10 0.423 2 0.809 Kruskal-Wallis
Using lapply and making use of a helper function this could be achieved like so:
Additionally I made use of bind_rows to bind the resulting list into one data frame.
Day<-c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2)
Group<-c("A","A","A","B","B","B","C","C","C","A","A","A","A","B","B","B","C","C","C")
Rain<-c(4,4,6,5,3,4,5,5,3,6,6,6,5,3,3,3,2,5,2)
UV<-c(6,6,7,8,5,6,5,6,6,6,7,7,8,8,5,6,8,5,7)
dat<-data.frame(Day,Group,Rain,UV)
library(rstatix)
library(tidyverse)
kt <- function(x, data) {
fmla <- as.formula(paste(x, "~ Group"))
data %>%
group_by(Day) %>%
kruskal_test(fmla)
}
lapply(c("Rain", "UV"), kt, data = dat) %>%
bind_rows()
#> # A tibble: 4 x 7
#> Day .y. n statistic df p method
#> <dbl> <chr> <int> <dbl> <int> <dbl> <chr>
#> 1 1 Rain 9 0.505 2 0.777 Kruskal-Wallis
#> 2 2 Rain 10 6.52 2 0.0384 Kruskal-Wallis
#> 3 1 UV 9 1.16 2 0.56 Kruskal-Wallis
#> 4 2 UV 10 0.423 2 0.809 Kruskal-Wallis

conditionally mutating column values using `dplyr`

I am using WRS2 to carry out robust pairwise comparisons. But one problem is that it removes the group level names from the output dataframes and saves it in a different object.
# setup
set.seed(123)
library(WRS2)
library(tidyverse)
# robust pairwise comparisons
x <- lincon(libido ~ dose, data = viagra, tr = 0.1)
# comparisons
x$comp
#> Group Group psihat ci.lower ci.upper p.value
#> [1,] 1 2 -1.0 -3.440879 1.44087853 0.25984505
#> [2,] 1 3 -2.8 -5.536161 -0.06383861 0.04914871
#> [3,] 2 3 -1.8 -4.536161 0.93616139 0.17288911
# vector with group level names
x$fnames
#> [1] "placebo" "low" "high"
I can convert it to a tibble:
# converting to tibble
suppressMessages(as_tibble(x$comp, .name_repair = "unique")) %>%
dplyr::rename(group1 = Group...1, group2 = Group...2)
#> # A tibble: 3 x 6
#> group1 group2 psihat ci.lower ci.upper p.value
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 -1 -3.44 1.44 0.260
#> 2 1 3 -2.8 -5.54 -0.0638 0.0491
#> 3 2 3 -1.8 -4.54 0.936 0.173
I would then like to replace the group column numeric values with actual names included in fnames (so map fnames[1] -> 1, fnames[2] -> 2, and so on).
So the final dataframe should look something like the following-
#> # A tibble: 3 x 6
#> group1 group2 psihat ci.lower ci.upper p.value
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 placebo low -1 -3.44 1.44 0.260
#> 2 placebo high -2.8 -5.54 -0.0638 0.0491
#> 3 low high -1.8 -4.54 0.936 0.173
In this case, it was easy to just copy-paste the three values, but I want to have a generalizable approach where no matter the number of levels, it works. How can I do this using dplyr?
Using a named vector to match with tidyverse. This matches by value and not by the sequence of index i.e. if the value in 'Group' columns are not in a sequence or character, this would still work
library(dplyr)
as_tibble(x$comp, .name_repair = 'unique') %>%
mutate(across(starts_with("Group"),
~ setNames(x$fnames, seq_along(x$fnames))[as.character(.)]))
Does this fullfil your needs :
names <- c("A","B","C")
df = data.frame(group=c(1,2,3))
library(dplyr)
df %>% mutate(group = names[group])
group
1 A
2 B
3 C
Here's an approach using the recode function, with the recoding vector built programmatically from the data:
# Setup
set.seed(123)
library(WRS2)
library(tidyverse)
x <- lincon(libido ~ dose, data = viagra, tr = 0.1)
# Create recoding vector
recode.vec = x$fnames %>% set_names(1:length(x$fnames))
# Recode columns
x.comp = x$comp %>%
as_tibble(.name_repair=make.unique) %>%
mutate(across(starts_with("Group"), ~recode(., !!!recode.vec)))
Output:
x.comp
#> # A tibble: 3 x 6
#> Group Group.1 psihat ci.lower ci.upper p.value
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 placebo low -1 -3.44 1.44 0.260
#> 2 placebo high -2.8 -5.54 -0.0638 0.0491
#> 3 low high -1.8 -4.54 0.936 0.173
Try this tidyverse approach formating data to long after extracting the objects as tibbles. You can use left_join() to get your groups as you want. Here the code to get something close to what you want:
# setup
set.seed(123)
library(WRS2)
library(tidyverse)
# robust pairwise comparisons
x <- lincon(libido ~ dose, data = viagra, tr = 0.1)
#Transform to tibble
df1 <- suppressMessages(as_tibble(x$comp, .name_repair = "unique")) %>%
dplyr::rename(group1 = Group...1, group2 = Group...2)
#Extract labels
df2 <- tibble(treat=x$fnames) %>% mutate(value=1:n())
#Format to long df1
df1 <- df1 %>%
mutate(id=1:n()) %>%
pivot_longer(cols = c(group1,group2)) %>%
rename(group=name) %>% left_join(df2) %>% select(-value) %>%
pivot_wider(names_from = group,values_from=treat) %>% select(-id)
Output:
# A tibble: 3 x 6
psihat ci.lower ci.upper p.value group1 group2
<dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 -1 -3.44 1.44 0.260 placebo low
2 -2.8 -5.54 -0.0638 0.0491 placebo high
3 -1.8 -4.54 0.936 0.173 low high

Run a aov test through a tibble in a tidy way

I want to run a linear regression on a data frame using the same dependent variable. A similar question was solved here. The problem is that aov function to implement ANOVA doesn't accept x and y as arguments (as far as I know). Is there a way to implement the analysis in a tidy way? So far I've tried something like:
library(tidyverse)
iris %>%
as_tibble() %>%
select(Sepal.Length, Species) %>%
mutate(foo_a = as_factor(sample(c("a", "b", "c"), nrow(.), replace = T)),
foo_b = as_factor(sample(c("d", "e", "f"), nrow(.), replace = T))) %>%
map(~aov(Sepal.Length ~ .x, data = .))
Created on 2019-02-12 by the reprex package (v0.2.1)
The desired output is three analysis: Sepal.Length and Species, Sepal.Length and foo_a and the last one Sepal.Length and foo_b. Is it possible or I am totally wrong?
One approach is to make this into a long-shaped data frame, group by the independent variable of interest, and use the "many models" approach. I usually prefer something like this over trying to do tidyeval across multiple columns—it just gives me a clearer sense of what's going on.
To save space, I'm working with iris_foo, which is your data as you created it up through the 2 mutate lines. Putting it into a long format gives you a key of the names of those three columns that will be used as independent variables in each of the aov calls.
library(tidyverse)
iris_foo %>%
gather(key, value, -Sepal.Length)
#> # A tibble: 450 x 3
#> Sepal.Length key value
#> <dbl> <chr> <chr>
#> 1 5.1 Species setosa
#> 2 4.9 Species setosa
#> 3 4.7 Species setosa
#> 4 4.6 Species setosa
#> 5 5 Species setosa
#> 6 5.4 Species setosa
#> 7 4.6 Species setosa
#> 8 5 Species setosa
#> 9 4.4 Species setosa
#> 10 4.9 Species setosa
#> # … with 440 more rows
From there, nest by key and create a new list-column of ANOVA models. This will be a list of aov objects. For simplicity with getting your models back out, you can drop the data column.
aov_models <- iris_foo %>%
gather(key, value, -Sepal.Length) %>%
group_by(key) %>%
nest() %>%
mutate(model = map(data, ~aov(Sepal.Length ~ value, data = .))) %>%
select(-data)
aov_models
#> # A tibble: 3 x 2
#> key model
#> <chr> <list>
#> 1 Species <S3: aov>
#> 2 foo_a <S3: aov>
#> 3 foo_b <S3: aov>
From there, you can work with the models however you like. They're accessible in the list aov_models$model. Printed, they look how you'd expect. For example, the first model:
aov_models$model[[1]]
#> Call:
#> aov(formula = Sepal.Length ~ value, data = .)
#>
#> Terms:
#> value Residuals
#> Sum of Squares 63.21213 38.95620
#> Deg. of Freedom 2 147
#>
#> Residual standard error: 0.5147894
#> Estimated effects may be unbalanced
To see all the models, call aov_models$model %>% map(print). You might also want to use broom functions, such as broom::tidy or broom::glance, depending on how you need to present the models.
aov_models$model %>%
map(broom::tidy)
#> [[1]]
#> # A tibble: 2 x 6
#> term df sumsq meansq statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 value 2 63.2 31.6 119. 1.67e-31
#> 2 Residuals 147 39.0 0.265 NA NA
#>
#> [[2]]
#> # A tibble: 2 x 6
#> term df sumsq meansq statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 value 2 0.281 0.141 0.203 0.817
#> 2 Residuals 147 102. 0.693 NA NA
#>
#> [[3]]
#> # A tibble: 2 x 6
#> term df sumsq meansq statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 value 2 0.756 0.378 0.548 0.579
#> 2 Residuals 147 101. 0.690 NA NA
Or tidying all the models into a single data frame, which keeps the key column, you could do:
aov_models %>%
mutate(model_tidy = map(model, broom::tidy)) %>%
unnest(model_tidy)

Resources