Can map() take functions with multiple inputs? - r

I want to loop glm/lm over multiple outcomes and predictors while stratified by groups. nest() and map() functions from purrr package seems to provide an elegant solution to stratification analysis. However, when I use a customized function which takes multiple input, map() doesn't seem to work.
In almost all the tutorials on map() from purrr I have seen,regression model examples are static -- the dependent and independent variables are explicitly defined in the function. Because I want to loop over dozens of outcomes and predictors, I am trying to write a lm() function that can iterate over different combinations.
library(dplyr)
library(broom)
library(tidyr)
library(purrr)
# example data set
set.seed(20)
df <- data.frame(
out = rep(c(0,1),5,replace=TRUE),
pre = sample(c(1:4),10,replace = TRUE),
var1 = sample(c(1:2),10,replace = TRUE),
var2 = sample(c(1:50),10,replace = TRUE),
group = sample(c(1:2),10,replace = TRUE)
)
explicit_fun<-function(data){
glm(out ~ pre + var1 + var2, data=data, family = binomial())
}
input_fun<-function(data, outcome, predictor, covariate){
glm(as.formula(paste(outcome,"~",predictor,"+",paste(covariate,collapse = "+"))),data=data,family = binomial())
}
# nesting the data set
df_by_group<-df%>%
group_by(group)%>%
nest()
it works fine with the explicit function
models <- df_by_group%>%
mutate(mod=purrr::map(data,explicit_fun))
models <- models%>%
mutate(
glance_glm=purrr::map(mod,broom::glance),
tidy_glm=purrr::map(mod,broom::tidy),
augment_glm=purrr::map(mod,broom::augment)
)
unnest(models,data)
unnest(models,glance_glm,.drop = TRUE)%>% View()
unnest(models,tidy_glm) %>% View()
it stops working when using the function has multiple inputs
models<-df_by_group%>%
mutate(mod=purrr::map(data,input_fun(data=.,outcome="out",predictor="pre",covariate=c("var1","var2"))))
I expect the input_fun would work the same as the explicit_fun, but I received the following error message:
Error in mutate_impl(.data, dots) :
Evaluation error: Can't convert a `glm/lm` object to function
Call `rlang::last_error()` to see a backtrace.

You need to pass a function to map(). Right now, you are calling a function in the second parameter, not passing a function. The quickest way to fix this is to use the formula syntax to create a function. Try
models <- df_by_group%>%
mutate(mod=purrr::map(data, ~input_fun(data=.,outcome="out",predictor="pre",covariate=c("var1","var2"))))
This delays the evaluation of input_fun till the map actually happens and properly fills in the . value.

Related

Problem using 'anova_fun' argument in 'add_global_p()'

I'm tying to display the results of analysis-of-variance applied to univariate regressions into a table created with gtsummary::tbl_uvregression() using another function than car::Anova(), that cannot handle every type of models. In my case I would like to use Anova.clm() from RVAideMemoire package, which is directly based on car::Anova() function and specifically built for clm(m) objects.
Unfortunately, any time I try to use anova_fun argument of add_global_p(), I get an error telling me that mod (the model) should be specified into the function with no default value. I read the documentation of add_global_p(), and in fact it is specified that the "anova_fun" function is used "in place of car::Anova()" and that it must accept a model as one of its arguments. But as far as I know, Anova.clm() uses it.
Moreover, the thing is even when I try to use car::Anova() itself through the anova_fun argument, I still get the same error. I also tried to go through the source codes of the different functions I need here, but couldn't find the solution and begin to get lost...
Here a reprex :
## Load packages
library(survival)
library(gtsummary)
## Create the table using tbl_uvregression
tab <- trial %>%
select(response,age,trt) %>%
tbl_uvregression(method = glm,
method.args = list(family = binomial),
y = response,
exponentiate = T)
## Add global p-values from analysis-of-variance through 'anova_fun' argument
tab %>%
add_global_p(anova_fun = car::Anova)
## And the same for tbl_regression
trial %>%
glm(response ~ age + trt, data = ., family = binomial("logit")) %>%
tbl_regression(exponentiate = T) %>%
add_global_p(anova_fun = car::Anova)
Maybe I just misunderstand the use of anova_fun argument into add_global_p()... In that case could somebody help me to deal with it ? Or at least is there an other (simple) way to add aov global p-values to a tbl_uvregression object applied to clmm models ?

Issue concerning the Y parameter in tbl_uvregression function inside a function

So I am trying to input the Y parameter of the tbl_uvregression function (gt_summary package) via a custom function. The idea is to create multiple tbl inside my function and return the different tables merged.
Here an example of the code I am using:
#Loading libraries + example dataset from questionr package
library(haven)
library(tidyverse)
library(finalfit)
library(dplyr)
library(survey)
library(srvyr)
library(gtsummary)
library(glue)
library(gt)
library(knitr)
library(questionr)
data(hdv2003)
Here is the part where I have an issue :
reg_log <- function(dataframew, variables, by) {
##param1 : weighted dataframe
##param2 : vector containing variables we want in our graph
##param3 : the variable or column we want as our Y argument
Table <- tbl_uvregression(data = dataframew, include = variables, exponentiate = TRUE, method.args = list(family = quasibinomial()), y = by, method = survey::svyglm)
return(Table)
}
When I run this function outside of reg_log, I have no issue, but it seems that inside a function, the Y parameter of tbl_uvregression does not evaluate the argument, but instead read it literally. Here's the error I get when calling my function:
hdv2003w <- svydesign(ids = ~1, data = hdv2003, weights = ~hdv2003$poids) #setting the survey.design object
reg_log(hdv2003w, c("age", "sexe", "hard.rock", "sport"), "sport")
x There was an error constructing model survey::svyglm(formula = by ~ age, design = ., family = quasibinomial()) See error below. Erreur : Problem with mutate() column model.
i model = map(...).
x Error in svyglm.survey.design(formula = by ~ age, design = structure(list(: all variables must be in design= argument
I am aware that the Y parameter requires a syntax without the quotes, but even when I'm using the substitute() function it does not work. I have resolved myself to make several possibilities using the switch function, but if anyone knows how to resolve this, it will be awesome.
Thanks.
The tbl_uvregression() function expects an unquoted input for y=, rather than a string with the outcome name. I updated your function to account for the string input.
library(gtsummary)
library(questionr)
data(hdv2003)
reg_log <- function(dataframew, variables, by) {
tbl_uvregression(
data = dataframew,
include = all_of(variables),
exponentiate = TRUE,
method.args = list(family = quasibinomial()),
y = !!rlang::sym(by),
method = survey::svyglm
)
}
hdv2003w <- survey::svydesign(ids = ~1, data = hdv2003, weights = ~hdv2003$poids) #setting the survey.design object
tbl <-
reg_log(hdv2003w, c("age", "sexe", "hard.rock"), "sport")
Created on 2021-11-12 by the reprex package (v2.0.1)

Can't give a subset when using randomForest inside a function

I'm wanting to create a function that uses within it the randomForest function from the randomForest package. This takes the "subset" argument, which is a vector of row numbers of the data frame to use for training. However, if I use this argument when calling the randomForest function in another defined function, I get the error:
Error in eval(substitute(subset), data, env) :
object 'tr_subset' not found
Here is a reproducible example, where we attempt to train a random forest to classify a response "type" either "A" or "B", based on three numerical predictors:
library(randomForest)
# define a random data frame to train with
test.data = data.frame(
type = rep(NA, times = 500),
x = runif(500),
y = runif(500),
z = runif(500)
)
train.data$type[runif(500) >= 0.5] = "A"
train.data$type[is.na(test.data$type)] = "B"
train.data$type = as.factor(test.data$type)
# define the training range
training.range = sample(500)[1:300]
# formula to use
tr_form = formula(type ~ x + y + z)
# Function that includes the randomForest function
train_rf = function(form, all_data, tr_subset) {
p = randomForest(
formula = form,
data = all_data,
subset = tr_subset,
na.action = na.omit
)
return(p)
}
# test the new defined function
test_tree = train_rf(form = tr_form, all_data = train.data, tr_subset = training.range)
Running this gives the error:
Error in eval(substitute(subset), data, env) :
object 'tr_subset' not found
If, however, subset = tr_subset is removed from the randomForest function, and tr_subset is removed from the train_rf function, this code runs fine, however the whole data set is used for training!
It should be noted that using the subset argument in randomForest when not defined in another function works completely fine, and is the intended method for the function, as described in the vignette linked above.
I know in the mean time I could just define another training set that has just the row numbers required, and train using all of that, but is there a reason why my original code doesn't work please?
Thanks.
EDIT: I conjecture that, as subset() is a base R function, R is getting confused and thinking you're wanting to use the base R function rather than defining an argument of the randomForest function. I'm not an expert, though, so I may be wrong.

Invalid type (closure) for a variable that is not a function

loess.smooth <- function(dat) {
dat <- dat[complete.cases(dat),]
## response
vars <- colnames(dat)
## covariate
id <- 1:nrow(dat)
## define a loess filter function (fitting loess regression line)
loess.filter <- function (x, span) loess(formula = paste(x, "id", sep = "~"),
data = dat,
degree = 1,
span = span)$fitted
## apply filter column-by-column
new.dat <- as.data.frame(lapply(vars, loess.filter, span = 0.75),
col.names = colnames(dat))
}
When I try to apply loess.smooth to a dataframe, I get the error:
Error in model.frame.default(formula = paste(x, "id", sep = "~"), data = dat) :
invalid type (closure) for variable 'id'
I don't understand why this is a problem since id is not a function, which is implied by the error.
When I run through these lines of code outside of the function, it works perfectly fine and does exactly what I want it to do.
It is a scoping issue involving passing a vector of strings to the loess function instead of passing a vector of formulas. The problem is that the environment returns NULL for the former, so loess doesn't know where to find it. If you wrap the formula in as.formula it works. This variable will be assigned the local environment inside the function call by default.
As to the cryptic error, it happens when you name a variable the same name of a given function from another package that is loaded, since if a function doesn't find a variable in the local environment, it will scope in the loaded packages for the function. In my case, the id function was loaded by the dplyr library.

Calling a formula within a custom function using quosures

I am trying to run a t-test within a custom function, and am running into a quosure misapplication (I believe). Any help would be greatly appreciated.
library(tidyverse)
tp_pull <- function(mydata, dv, iv){
dv <- enquo(dv)
iv <- enquo(iv)
t.test(!!dv ~ !!iv, mydata)
}
tp_pull(mydata = mtcars, dv = mpg, iv = vs)
My error message reads:
numerical expression has 2 elements: only the first usedNAs introduced by
coercion
Show Traceback
Error in quo_name(dv):~!(!iv) : NA/NaN argument
For context this t-test will be part of a larger custom function.
Quosures are unique to tidyeval and are not supposed by the base R language. Right now they only work with dplyr. It is very unlikely these will ever work with base functions such as t.test.
If you want to do this with base R, you can use G. Grothendieck's suggestion of
tp_pull <- function(mydata, dv, iv){
t.test(formula(substitute(dv ~ iv)), mydata)
}
tp_pull(mydata = mtcars, dv = mpg, iv = vs)
The substitute captures the un-evaulated symbol names from the promise passed to the call and allows you to re-assemble them into a new expression. The formula() call helps coerce the un-evaulated expression returned by substitute() into a proper R formula object.

Resources