I am trying to use purrr::pmap() to apply a custom function in a rowwise fashion along some dataframe rows. I can achieve my desired end result with a for-loop and with apply(), but when I try to use pmap() I can only get the result I want in combination with mutate(), which in my real-life applied case will be insufficient.
Is there a way to use pmap() to apply my custom function and just have the output print rather than be stored in a new column?
library(dplyr)
library(purrr)
library(tibble)
Create demo data & custom function
set.seed(57)
ds_mt <-
mtcars %>%
rownames_to_column("model") %>%
mutate(
am = factor(am, labels = c("auto", "manual")),
vs = factor(vs, labels = c("V", "S"))
) %>%
select(model, mpg, wt, cyl, am, vs) %>%
sample_n(3)
foo <- function(model, am, mpg){
print(
paste("The", model, "has a", am, "transmission and gets", mpg, "mpgs.")
)
}
Successful example of rowwise for-loop:
for (row in 1:nrow(ds_mt)) {
foo(
model = ds_mt[row, "model"],
am = ds_mt[row, "am"],
mpg = ds_mt[row, "mpg"]
)
}
Successful example using apply():
row.names(ds_mt) <- NULL # to avoid named vector as output
apply(
ds_mt,
MARGIN = 1,
FUN = function(ds)
foo(
model = ds["model"],
am = ds["am"],
mpg = ds["mpg"]
)
)
Example using pmap() within mutate() that is almost what I need.
ds_mt %>%
mutate(new_var =
pmap(
.l =
list(
model = model,
am = am,
mpg = mpg
),
.f = foo
))
FAILING CODE: Why doesn't this work?
ds_mt %>%
pmap(
.l =
list(
model = model,
am = am,
mpg = mpg
),
.f = foo
)
So after some more reading it seems this is a case for pwalk() rather than pmap(), because I am trying to get output to print (i.e., a side effect) rather than to be stored in a dataframe.
library(dplyr)
library(purrr)
library(tibble)
set.seed(57)
ds_mt <-
mtcars %>%
rownames_to_column("model") %>%
mutate(
am = factor(am, labels = c("auto", "manual")),
vs = factor(vs, labels = c("V", "S"))
) %>%
select(model, mpg, wt, cyl, am, vs) %>%
sample_n(3)
foo <- function(model, am, mpg){
print(
paste("The", model, "has a", am, "transmission and gets", mpg, "mpgs.")
)
}
ds_mt %>%
select(model, am, mpg) %>%
pwalk(
.l = .,
.f = foo
)
Related
I have a list of linear regression fit objects. Let's create it in this example by:
c('hp', 'wt', 'disp') %>%
paste('mpg ~', .) %>%
map(as.formula) %>%
map(lm, data = mtcars)
What I would like is to keep just the residuals and fitted.values from each of the regression fit objects, within this same pipeline. I was trying to use the keep function, but it doesn't work:
c('hp', 'wt', 'disp') %>%
paste('mpg ~', .) %>%
map(as.formula) %>%
map(lm, data = mtcars) %>%
map(keep, names(.) %in% c("residuals", "fitted.values"))
Error:
Error in probe(.x, .p, ...) : length(.p) == length(.x) is not TRUE
How can I perform this action?
If a data frame is wanted as output then use the code below or if a list is wanted omit the bind_rows line.
library(dplyr)
library(purrr)
nms <- c('hp', 'wt', 'disp')
out <- nms %>%
set_names(x = map(paste('mpg ~', .), as.formula)) %>%
map(lm, data = mtcars) %>%
map(~ data.frame(fit = fitted(.), resid = resid(.))) %>%
bind_rows(.id = "id")
We can simplify this slightly using sapply as it will add names and use reformulate to generate the formula.
out <- nms %>%
sapply(reformulate, response = "mpg") %>%
map(lm, data = mtcars) %>%
map(~ data.frame(fit = fitted(.), resid = resid(.))) %>%
bind_rows(.id = "id")
i'm having the following problem:
Context:
I'm using gtsummary to explore frequencies in a dataframe using cross variables.
Here's my desire output:
So that i have a main variable tobgp and its cross by multiple variables like agegp and algp
Attempt:
this is what i've done so far. Using the esoph data from the package The R Datasets Package (datasets).
pacman::p_load(tidyverse, gt, gtsummary)
multiple_table<-function(data, var){
t0<- data %>%
select({{var}}) %>%
gtsummary::tbl_summary(statistic = all_categorical()~ "{p}% ({n})",
digits = list(everything() ~ c(2, 0))) %>%
modify_header(label ~ "") %>%
bold_labels()
#agep
t1<-data %>%
select({{var}}, agegp) %>%
gtsummary::tbl_summary(by = agegp, statistic = all_categorical()~ "{p}% ({n})",
digits = list(everything() ~ c(2, 0)))
#alcgp
t2<-data %>%
select({{var}}, alcgp) %>%
gtsummary::tbl_summary(by = alcgp, statistic = all_categorical()~ "{p}% ({n})",
digits = list(everything() ~ c(2, 0)))
#MERGE
tbl_merge(tbls = list(t0,t1,t2),
tab_spanner = c("**Total**", "**agegp**", "**algp**")) %>%
as_gt() %>%
gt::tab_source_note(gt::md("*Fuente: Empresa1*"))
}
esoph %>%
multiple_table(tobgp)
The problem with my code so far is that is specific for the crosses, to add more cross variables i have to modify the function i created which is not so friendly.
Request:
Create a function so that you can create the desire output with one line of code. Like this for example:
multiple_table(data, main, by)
esoph %>%
multiple_table(main=tobgp, by=c(agegp, algp)
So that if i want to use other variables to cross by i only have to change the by=c() argument.
In order to be easy to do something like:
esoph %>%
multiple_table(main=tobgp, by=c(agegp, algp, variable1, variable2)
Notes:
I've tried other functions inside gtsummary like tbl_strata which can use two variables as crosses, but doesn't suit my needs because it mixes the two cross variables like this:
This is not what i'm looking for. As you can see, Grade divides the percentage of Drug test by each Grade. This example is taken from gtsummary vignette: https://www.danieldsjoberg.com/gtsummary/reference/tbl_strata.html
I think the solution for my problem could involve some workaround with purrr, or apply, i've tried some but i'm not very good using lists and iterations.
That's it. Thanks very much for listening and i hope i've been very clear. If not, feel free to ask.
Answers 28/03/22
Since i posted my question i've recieve to different approach answers which both work perfectly. Feel free to use the one that suits you. Thanks Mike for the answer in StackOverflow and thanks Tan, June C, Tyler Grant Smith for the answer in the Slack R4DS Community. In my case i would stick with the approach 3.
Approach 1: The Mike approach
library(gtsummary)
library(dplyr)
esoph <- mutate(esoph,
ncases = ifelse(ncases > 2, "High","Low"))
multiple_table<-function(data, var, vars){
t0 <- data %>%
select( var ) %>%
gtsummary::tbl_summary(statistic = all_categorical()~ "{p}% ({n})",
digits = list(everything() ~ c(2, 0))) %>%
modify_header(label ~ "") %>%
bold_labels()
tlist <- lapply(vars,function(y){
data %>%
select( var , y ) %>%
gtsummary::tbl_summary(by = y , statistic = all_categorical()~ "{p}% ({n})",
digits = list(everything() ~ c(2, 0)))
})
tabspannername <- c("**Total**", paste0("**",vars,"**"))
tlist2 <- append(list(t0), tlist,1)
tbl_merge(tbls = tlist2
,tab_spanner = tabspannername
) %>%
as_gt() %>%
gt::tab_source_note(gt::md("*Fuente: Empresa1*"))
}
multiple_table(data = esoph, var = "tobgp", vars = c("agegp", "alcgp","ncases"))
Approach 2: The Tan approach
library(tidyverse)
library(gt)
library(gtsummary)
esoph
fn_subtable <- function(data, main, sub){
data %>%
dplyr::select({{main}},{{sub}}) %>%
gtsummary::tbl_summary(
by = {{sub}},
statistic = gtsummary::all_categorical()~ "{p}% ({n})",
digits = list(dplyr::everything() ~ c(2, 0)))
}
fn_table <-function(data, main_var, sub_vars){
t0 <- data %>%
dplyr::select({{main_var}}) %>%
gtsummary::tbl_summary(statistic = gtsummary::all_categorical() ~ "{p}% ({n})",
digits = list(dplyr::everything() ~ c(2, 0))) %>%
gtsummary::modify_header(label ~ "") %>%
gtsummary::bold_labels()
sub_tables <- purrr::map(sub_vars, ~fn_subtable(data = data, main = main_var, sub = .x))
#MERGE
tbls <- c(list(t0), sub_tables) %>%
gtsummary::tbl_merge(tab_spanner = c("**Total**", paste0("**",sub_vars,"**"))) %>%
gtsummary::as_gt() %>%
gt::tab_source_note(gt::md("*Fuente: Empresa1*"))
tbls
}
esoph %>% fn_table("tobgp", c("agegp", "alcgp"))
Approach 3: The June C - Tyler Grant Smith approach
library(tidyverse)
library(gt)
library(gtsummary)
fn_subtable <- function(data, main, sub){
data %>%
dplyr::select({{main}},{{sub}}) %>%
gtsummary::tbl_summary(
by = {{sub}},
statistic = gtsummary::all_categorical()~ "{p}% ({n})",
digits = list(dplyr::everything() ~ c(2, 0)))
}
fn_table3 <- function(data, main_var, sub_vars){
main_var <- rlang::enexpr(main_var)
sub_vars_expr <- rlang::enexpr(sub_vars) # 1. Capture `list(...)` call as expression
sub_vars_args <- rlang::call_args(sub_vars_expr) # 2. Pull out the arguments (they're now also exprs)
sub_vars_fn <- rlang::call_fn(sub_vars_expr) # 3. Pull out the fn call
# 4. Evaluate the fn with expr-ed arguments (this becomes `list( expr(agegp), expr(alcgp) )` )
sub_vars_reconstructed <- rlang::exec(sub_vars_fn, !!!sub_vars_args)
# --- sub_vars replaced with sub_vars_reconstructed from here onwards ---
t0 <- data %>%
dplyr::select({{main_var}}) %>%
gtsummary::tbl_summary(statistic = gtsummary::all_categorical() ~ "{p}% ({n})",
digits = list(dplyr::everything() ~ c(2, 0))) %>%
gtsummary::modify_header(label ~ "") %>%
gtsummary::bold_labels()
sub_tables <- purrr::map(sub_vars_reconstructed, ~fn_subtable(data = data, main = main_var, sub = .x))
tbls <- c(list(t0), sub_tables) %>%
gtsummary::tbl_merge(tab_spanner = c("**Total**", paste0("**",sub_vars_reconstructed,"**"))) %>%
gtsummary::as_gt() %>%
gt::tab_source_note(gt::md("*Fuente: Empresa1*"))
tbls
}
fn_table3(esoph,tobgp,list(agegp,alcgp))
Thanks very much and i hope this could be implemented as a function inside the gtsummary package because is very useful to explore frequencies with different cross variables.
you are pretty close and only needed a few modifications. the major change is adding in an lapply() to loop through the vars input to create a list of tbl_summary objects. Then I create the tab spanner names from the inputs of vars and append the t0 table to the list created by the lapply(). then you can pass tlist2 to tbl_merge() with the names created with tabspannername to dynamically label the tables.
library(gtsummary)
library(dplyr)
esoph <- mutate(esoph,
ncases = ifelse(ncases > 2, "High","Low"))
multiple_table<-function(data, var, vars){
t0 <- data %>%
select( var ) %>%
gtsummary::tbl_summary(statistic = all_categorical()~ "{p}% ({n})",
digits = list(everything() ~ c(2, 0))) %>%
modify_header(label ~ "") %>%
bold_labels()
tlist <- lapply(vars,function(y){
esoph %>%
select( var , y ) %>%
gtsummary::tbl_summary(by = y , statistic = all_categorical()~ "{p}% ({n})",
digits = list(everything() ~ c(2, 0)))
})
tabspannername <- c("**Total**", paste0("**",vars,"**"))
tlist2 <- append(list(t0), tlist,1)
tbl_merge(tbls = tlist2
,tab_spanner = tabspannername
) %>%
as_gt() %>%
gt::tab_source_note(gt::md("*Fuente: Empresa1*"))
}
x <- multiple_table(data = esoph, var = "tobgp", vars = c("agegp", "alcgp","ncases"))
I don't know how to order columns in tidyHtmlTable function from htmlTable package.
The documentation says:
Columns are sorted by arrange(cgroup,header) where cgroup will be expanded to the columns
of the cgroup argument, e.g. cgroup = c(a, b), header = c will become arrange(a,b,c). If you
want to sort in non-alphabetic order you can provide a factor variable and that information will be retained.
How can I provide such factor variable? For example, in the doc example
library(magrittr)
library(tidyr)
library(dplyr)
library(htmlTable)
library(tibble)
td <- mtcars %>%
as_tibble(rownames = "rnames") %>%
pivot_longer(names_to = "per_metric",
cols = c(hp, mpg, qsec))
tidy_summary <- td %>%
group_by(cyl, gear, per_metric) %>%
summarise(Mean = round(mean(value), 1),
SD = round(sd(value), 1),
Min = round(min(value), 1),
Max = round(max(value), 1),
.groups = 'drop') %>%
pivot_longer(names_to = "summary_stat",
cols = c(Mean, SD, Min, Max)) %>%
ungroup() %>%
mutate(gear = paste(gear, "Gears"),
cyl = paste(cyl, "Cylinders"))
tidy_summary %>%
arrange(per_metric, summary_stat) %>%
addHtmlTableStyle(align = "r") %>%
tidyHtmlTable(header = gear,
cgroup = cyl,
rnames = summary_stat,
rgroup = per_metric)
if I want to maintain the order of rgroup and rnames but, for example, I want the columns in non-alphabetical order, like this:
4 Gears |3 Gears |5 Gears
How can I do that? Thank you.
What we can do is to make the changes before tidyHtmlTable
How it works:
mutate gear to factor class
Use fct_relevel from forcats package (it is in tidyverse)
library(forcats)
tidy_summary %>%
arrange(per_metric, summary_stat) %>%
addHtmlTableStyle(align = "r") %>%
mutate(gear = as.factor(gear),
gear = fct_relevel(gear, "4 Gears", "3 Gears", "5 Gears")) %>%
tidyHtmlTable(header = gear,
cgroup = cyl,
rnames = summary_stat,
rgroup = per_metric)
In the following code, I want to replace map_dfr from purrr with one of the SparkR apply functions to parallelize the Shapley calculations on the azure databricks:
#install.packages("randomForest"); install.packages("tidyverse"); install.packages("iml"); install.packages(SparkR)
library(tidyverse); library(iml); library(randomForest); library(SparkR)
mtcars1 <- mtcars %>% mutate(vs = as.factor(vs), id = row_number())
x <- "vs"
y <- paste0(setdiff(setdiff(names(mtcars1), "vs"), "id"), collapse = "+")
rf = randomForest(as.formula(paste0(x, "~ ", y)), data = mtcars1, ntree = 50)
predictor <- Predictor$new(rf, data = mtcars1, y = mtcars1$vs)
shapelyresults <- map_dfr(1:nrow(mtcars), ~(Shapley$new(predictor, x.interest = mtcars1[.x,]) %>%
.$results %>%
as_tibble() %>%
arrange(desc(phi)) %>%
slice(1:5) %>%
select(feature.value, phi) %>%
mutate(id = .x)))
I could not leverage the answer on the following link: How to apply a function to each row in SparkR?
I am trying to use quosures to pass along variable names within a custom function for data processing and use in a formula, but my use of quosures in the formula is not correct. Is there a better way to unquote arguments within a formula?
library(dplyr)
library(broom)
library(purrr)
library(tidyr)
foo <- function(mydata, dv, iv, group_var) {
dv = enquo(dv)
iv = enquo(iv)
group_var = enquo(group_var)
mydata <- mydata %>%
group_by(!!group_var) %>%
nest()
mydata %>%
mutate(model = map(data,
~summary(lm(formula(substitute(dv ~ iv)), data = .))
)) %>%
unnest(model %>% map(tidy))
}
foo(mydata=mtcars, dv=mpg, iv=wt, group_var=cyl)
My code produces "Error in mutate_impl(.data, dots) : Evaluation error: object is not a matrix."
This is a working version of code I am trying to make into a function:
mtcars %>%
group_by(cyl) %>%
nest() %>%
mutate(model = map(data, ~summary(lm(mpg ~ wt, data = .)))) %>%
unnest(model %>% map(tidy))
You need to use base R nonstandard evaluation with functions like lm which are not "in the tidyverse" so to speak.
So you could change things to:
foo <- function(mydata, dv, iv, group_var) {
flma <- as.formula(paste(substitute(dv), "~", substitute(iv)))
group_var = enquo(group_var)
mydata <- mydata %>%
group_by(!!group_var) %>%
nest()
mydata %>%
mutate(model = map(data, ~summary(lm(flma, data = .)))) %>%
unnest(model %>% map(tidy))
}
foo(mtcars, mpg, wt, cyl)
That's fine if you know you are only doing simple regression. For more flexibility just pass the formula directly, as in:
foo2 <- function(mydata, flma, group_var) {
group_var = enquo(group_var)
mydata <- mydata %>%
group_by(!!group_var) %>%
nest()
mydata %>%
mutate(model = map(data, ~summary(lm(flma, data = .)))) %>%
unnest(model %>% map(tidy))
}
foo(mtcars, mpg ~ wt, cyl)