Vectorised wilcox.test function dplyr - r

I have tried to create a vectorised version of the wilcox.test function in R. It doesn't return the same p values as the original wilcox.test function. Does anyone understand why ?
library(tidyverse)
vect_wilcox <- function(df, grouping_variable, x, y){
df %>%
group_by({{grouping_variable}}) %>%
group_map(~ broom::tidy(wilcox.test(pull(.x[x]), pull(.x[y])), paired = TRUE)) %>%
enframe %>%
unnest(value) %>%
mutate(df %>% distinct({{grouping_variable}}))
}
df <- tribble(
~session, ~name_var, ~time_pt1, ~time_pt2,
1, "fio2", 90, NA,
2, "fio2", 100, 80,
3, "fio2", 100, 70,
4, "fio2", 90, 70,
1, "ph", 7.24, NA,
2, "ph", 7.19, 7.38,
3, "ph", 7.2, 7.2,
4, "ph", 7.37, 7.33
)
new_wilcox <- vect_wilcox(df, grouping_variable = name_var, x= "time_pt1", y="time_pt2")
d3 <- df %>%
pivot_longer(col = 3:4, names_to = "time_point", values_to = "value") %>%
pivot_wider(
names_from = c(name_var, time_point),
values_from = value,
names_sep = "_")
pval = format(wilcox.test(d3$fio2_time_pt1,d3$fio2_time_pt2,paired=T)$p.value,digits=3)
pval = c(pval,format(wilcox.test(d3$ph_time_pt1,d3$ph_time_pt2,paired=T)$p.value,digits=3))
(comp<- new_wilcox %>%
select(name_var, p.value) %>%
mutate(old_p.value = pval) %>%
rename(new_p.value = p.value))
dput(comp)
Thanks a lot !

There were two errors:
a parenthesis error in group_map
the variable reassignement with mutate(df %>% distinct({{grouping_variable}}))obviously didn't work so I changed group_map all together for group_modify.
This function works:
vect_wilcox <- function(df, grouping_variable, x, y){
df %>%
group_by({{grouping_variable}}) %>%
group_modify(~ broom::tidy(
wilcox.test(
x = pull(.x[x]),
y = pull(.x[y]),
paired = TRUE))) %>%
ungroup()
}

Related

Sideways/vertical spanner label in gt tables in R

I have two dataframes that I want to turn into a single table using gt
library(dplyr)
library(gt)
a <- rnorm(21, mean = 112, sd =12)
colour <- rep(c("Blue", "Red", "Green"), 7)
data <- data.frame(colour, a)
data <- data %>%
group_by(colour) %>%
summarise(mean = mean(a), sd = sd(a), n = n()) %>%
ungroup() %>%
gt()
a <- rnorm(21, mean = 60, sd =12)
day <- rep(c("2", "4", "6"), 7)
data2 <- data.frame(day, a)
data2 <- data2 %>%
group_by(day) %>%
summarise(mean = mean(a), sd = sd(a), n = n()) %>%
ungroup() %>%
gt()
How do I stack the two dataframes on top of each other, and apply two sideways spanner labels of colour and day. Something similar to below where 2014, 2015 are my mean and sd columns, and China is colour, with blue, red green underneath, and India is day with the days stacked underneath.
OR (for curiosity/ideally).
Not have colour and day where China and India are, but instead have a sideways spanner. (i.e. vertical instead of horizontal). Horizontal isn't good for my real data as there is too many categories and would make it a really wide table.
You can give each a variable to identify what the group will be ("grp") and rename the color/day variable ("cat") and use bind_rows to combine them before using gt:
a <- rnorm(21, mean = 112, sd =12)
colour <- rep(c("Blue", "Red", "Green"), 7)
data <- data.frame(colour, a) %>%
group_by(colour) %>%
summarise(mean = mean(a), sd = sd(a), n = n()) %>%
mutate(grp = "colour") %>%
rename(cat = colour)
b <- rnorm(21, mean = 60, sd =12)
day <- rep(c("2", "4", "6"), 7)
data2 <- data.frame(day, b) %>%
group_by(day) %>%
summarise(mean = mean(a), sd = sd(a), n = n()) %>%
mutate(grp = "day") %>%
rename(cat = day)
bind_rows(data, data2) %>%
group_by(grp) %>%
gt(rowname_col = "cat")
Also I don't think this is exactly what you are asking for on the second option, but there is a row_group.as_column option for tab_options:
bind_rows(data, data2) %>%
group_by(grp) %>%
gt() %>%
tab_options(row_group.as_column = TRUE)

Change row group labels in gt table (with superscript/subscript and line breaks). Customising row group labels in R

I have the following data and table:
library(gt)
library(dplyr)
a <- rnorm(21, mean = 112, sd =12)
colour <- rep(c("Blue", "Red", "Green"), 7)
data <- data.frame(colour, a) %>%
group_by(colour) %>%
summarise(mean = mean(a), sd = sd(a), n = n()) %>%
mutate(grp = html("[H<sub>2</sub>O]")) %>%
rename(cat = colour)
b <- rnorm(21, mean = 60, sd =12)
day <- rep(c("2", "4", "6"), 7)
data2 <- data.frame(day, b) %>%
group_by(day) %>%
summarise(mean = mean(a), sd = sd(a), n = n()) %>%
mutate(grp = html("[H<sub>2</sub>O] Additition <br> (Days)")) %>%
rename(cat = day)
bind_rows(data, data2) %>%
group_by(grp) %>%
gt(rowname_col = "cat")
bind_rows(data, data2) %>%
group_by(grp) %>%
gt() %>%
tab_options(row_group.as_column = TRUE)
The row group labels appear literally as '[H<sub>2<\sub>O]', rather than [H2O] etc. It is likely that I am using HTML wrong and it needs to be used with another package/function. I have also tried using cols_label but doesn't recognise these as columns in the dataframe.
Is there also a way to have the row groups column vertically centered, rather than at the top where is currently is? How do you bold these row groups?
The html function won't work outside of a gt table, so you'll have to create the row groups using tab_row_group and add the html labels there.
data <- data.frame(colour, a) %>%
group_by(colour) %>%
summarise(mean = mean(a), sd = sd(a), n = n()) %>%
mutate(grp = "color") %>%
rename(cat = colour)
data2 <- data.frame(day, b) %>%
group_by(day) %>%
summarise(mean = mean(a), sd = sd(a), n = n()) %>%
mutate(grp = "day") %>%
rename(cat = day)
bind_rows(data, data2) %>%
gt() %>%
tab_row_group(
label = html("[H<sub>2</sub>O]"),
rows = grp == "color"
) %>%
tab_row_group(
label = html("[H<sub>2</sub>O] Additition <br> (Days)"),
rows = grp == "day"
) %>%
cols_hide(grp)

pivot wider - create two levels of headers for better readability

There are six regression models. I used pivot wider but it is difficult to read.
Can I use two level of headers -
first level - regression model
second level - estimate, tstat
library(dplyr)
regression <- c(rep("A", 3), rep("B", 3), rep("C", 3), rep("D", 3), rep("E", 3), rep("F", 3))
attribute <- rep(c("b0", "b1", "b2"), 6)
estimate <- round(runif(n = 18, min = 0, max = 10), 2)
tstat <- round(runif(n = 18, min = 0, max = 10), 2)
# tibble
tbl <- tibble(regression, attribute, estimate, tstat)
# pivot wider
tbl <- tbl %>%
pivot_wider(names_from = regression,
values_from = c("estimate", "tstat"))
One option is separate_header from ftExtra
library(ftExtra)
library(dplyr)
library(tidyr)
library(stringr)
tbl %>%
pivot_wider(names_from = regression,
values_from = c("estimate", "tstat")) %>%
rename_with(~ str_replace(., "(.*)_(.*)", "\\2_\\1"), -1) %>%
as_flextable() %>%
separate_header()
-output
Or may use span_header
library(flextable)
tbl %>%
pivot_wider(names_from = regression,
values_from = c("estimate", "tstat")) %>%
rename_with(~ str_replace(., "(.*)_(.*)", "\\2_\\1"), -1) %>%
select(attribute, order(str_remove(names(.)[-1], "_.*")) + 1) %>%
as_flextable() %>%
span_header() %>%
align(align = "center", part = "all")
-output
If we need to make some column bold,
tbl %>%
pivot_wider(names_from = regression,
values_from = c("estimate", "tstat")) %>%
rename_with(~ str_replace(., "(.*)_(.*)", "\\2_\\1"), -1) %>%
mutate(across(ends_with('tstat'), ~sprintf('**%.2f**', .))) %>%
select(attribute, order(str_remove(names(.)[-1], "_.*")) + 1) %>%
as_flextable() %>%
span_header() %>%
align(align = "center", part = "all") %>%
colformat_md()
-output

Vectorization to extract and bind very nested data

I have some very nested data. Within my list-column-dataframes, there are some pieces I need to put together and I've done so in a single instance to get my desired dataframe:
a <- df[[2]][["result"]]#data
b <- df[[2]][["result"]]#coords
desired_df <- cbind(a, b)
My original Large list has 171 elements, meaning I have 1:171 (3.3 GB) to go inside those square brackets and would ideally end up with 171 desired dataframes (which I would then bind all together).
I haven't needed to write a loop in 10 years, but I don't see a tidyverse way to deal with this. I also no longer know how to write loops. There are definitely some elements in there that are junk and will fail.
You haven't provided any sort of minimal example of the data.
I've condensed it to mean something like this
base_data <- data.frame(group = c("a", "b", "c"), var1 = c(3, 1, 2),
var2 = c( 2, 4, 8))
base_data2 = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
ncol = 3,
byrow = TRUE
)
rownames(base_data2) = c("d", "e", "f")
methods::setClass(
"weird_object",
slots = c(data = "data.frame", coords = "matrix"),
prototype = list(data = base_data, coords = base_data2)
)
df <- list(
list(
result = new("weird_object")
),list(
result = new("weird_object")
),list(
result = new("weird_object")
),list(
result = new("weird_object")
)
)
And if I had such a list with these objects, then I could do
df %>%
map(. %>% {
list(data = .$result#data,
cooords = .$result#coords)
}) %>%
enframe() %>%
unnest_wider(value)
But the selecting / hoisting function might fail, thus
one can wrap it in a purrr::possibly, and
choose a reasonable default:
df %>%
map(possibly(. %>% {
list(data = .$result#data,
cooords = .$result#coords)
},
otherwise = list(data = NA, coords = NA))) %>%
enframe() %>%
unnest_wider(value)
Hopefully, this could be a step forward.
Next step is probably something resembling this:
df %>%
map(. %>% {
list(data = .$result#data,
coords = .$result#coords)
}) %>%
enframe() %>%
unnest_wider(value) %>%
mutate(coords = coords %>% map(. %>% as_tibble(rownames = "rowid"))) %>%
unnest(cols = c(data, coords)) %>%
#' rotating the thing now
pivot_longer(cols = c(group, rowid),
names_to = "var_name",
values_to = "var") %>%
select(-var_name) %>%
pivot_longer(cols = c(var1, var2, V1, V2, V3),
names_to = "var_name") %>%
pivot_wider(names_from = var, values_from = value) %>%
identity()
If I understand your data structure, which I probably don't, you could do:
library(tidyverse)
# Create dummy data
df <- mtcars
df$mpg <- list(result = I(list('test')))
df$mpg$result <- list("#data" = I(list('your data')))
df <- df %>% select(mpg, cyl)
df1 <- df
df2 <- df
# Pull data you're interested in.
# The index is 1 here, instead of 2, because it's fake data and not your data.
# Assuming the # is not unique, and is just parsed from JSON or some other format.
dont_at_me <- function(x){
a <- x[[1]][["result"]][["#data"]]
a
}
# Get a list of all of your data.frames
all_dfs <- Filter(function(x) is(x, "data.frame"), mget(ls()))
# Vectorize
purrr::map(all_dfs, ~dont_at_me(.))

R sf::st_convex_hull() losing sf class on data with multiple groups

I am trying to create a 95% minimum convex polygon using sf in R. My code works fine as long as I only group my data on 1 variable, but when I group on two variables, the output loses its sf class and becomes a grouped_df instead.
Here is a toy dataset as an example
library(dplyr)
set.seed(12)
toy <- tibble::tibble(
ID = rep(c(1,2), each = 10),
year = rep(c(1,2), 10),
lat = runif(20, 1, 10),
long = runif(20, 1, 10)
) %>%
sf::st_as_sf(., coords = c("long", "lat"))
toy %>%
group_by(ID) %>%
summarize(.groups = "keep") %>%
mutate(cent = sf::st_centroid(geometry)) %>%
sf::st_cast(to = "POINT") %>%
mutate(dist = sf::st_distance(geometry, cent, by_element = TRUE)) %>%
filter(dist <= quantile(dist, .95)) %>%
summarize() %>%
sf::st_convex_hull() %>%
class()
This gives the output I want. But when I try to group by two variables, the result loses the sf class.
toy %>%
group_by(ID, year) %>%
summarize(.groups = "keep") %>%
mutate(cent = sf::st_centroid(geometry)) %>%
sf::st_cast(to = "POINT") %>%
mutate(dist = sf::st_distance(geometry, cent, by_element = TRUE)) %>%
filter(dist <= quantile(dist, .95)) %>%
summarize() %>%
sf::st_convex_hull() %>%
class
Is there something in my code keeping me from being able to group on two variables?
It's because the second summarize is regrouping on ID. You need a .groups = "keeps" there to pass the same grouping through and then convert back to an sf object. Alternatively you could create a grouping variable mutate(grp = paste0(ID, year)) and do group_by(grp) .
toy %>%
group_by(ID, year) %>%
summarize(.groups = "keep") %>%
mutate(cent = sf::st_centroid(geometry)) %>%
sf::st_cast(to = "POINT") %>%
mutate(dist = sf::st_distance(geometry, cent, by_element = TRUE)) %>%
filter(dist <= quantile(dist, .95)) %>%
summarize(.groups = "keep") %>%
sf::st_convex_hull() %>%
st_sf()

Resources