Issue concerning the Y parameter in tbl_uvregression function inside a function - r

So I am trying to input the Y parameter of the tbl_uvregression function (gt_summary package) via a custom function. The idea is to create multiple tbl inside my function and return the different tables merged.
Here an example of the code I am using:
#Loading libraries + example dataset from questionr package
library(haven)
library(tidyverse)
library(finalfit)
library(dplyr)
library(survey)
library(srvyr)
library(gtsummary)
library(glue)
library(gt)
library(knitr)
library(questionr)
data(hdv2003)
Here is the part where I have an issue :
reg_log <- function(dataframew, variables, by) {
##param1 : weighted dataframe
##param2 : vector containing variables we want in our graph
##param3 : the variable or column we want as our Y argument
Table <- tbl_uvregression(data = dataframew, include = variables, exponentiate = TRUE, method.args = list(family = quasibinomial()), y = by, method = survey::svyglm)
return(Table)
}
When I run this function outside of reg_log, I have no issue, but it seems that inside a function, the Y parameter of tbl_uvregression does not evaluate the argument, but instead read it literally. Here's the error I get when calling my function:
hdv2003w <- svydesign(ids = ~1, data = hdv2003, weights = ~hdv2003$poids) #setting the survey.design object
reg_log(hdv2003w, c("age", "sexe", "hard.rock", "sport"), "sport")
x There was an error constructing model survey::svyglm(formula = by ~ age, design = ., family = quasibinomial()) See error below. Erreur : Problem with mutate() column model.
i model = map(...).
x Error in svyglm.survey.design(formula = by ~ age, design = structure(list(: all variables must be in design= argument
I am aware that the Y parameter requires a syntax without the quotes, but even when I'm using the substitute() function it does not work. I have resolved myself to make several possibilities using the switch function, but if anyone knows how to resolve this, it will be awesome.
Thanks.

The tbl_uvregression() function expects an unquoted input for y=, rather than a string with the outcome name. I updated your function to account for the string input.
library(gtsummary)
library(questionr)
data(hdv2003)
reg_log <- function(dataframew, variables, by) {
tbl_uvregression(
data = dataframew,
include = all_of(variables),
exponentiate = TRUE,
method.args = list(family = quasibinomial()),
y = !!rlang::sym(by),
method = survey::svyglm
)
}
hdv2003w <- survey::svydesign(ids = ~1, data = hdv2003, weights = ~hdv2003$poids) #setting the survey.design object
tbl <-
reg_log(hdv2003w, c("age", "sexe", "hard.rock"), "sport")
Created on 2021-11-12 by the reprex package (v2.0.1)

Related

R add multiple different inputs interacted within an lapply function

I am trying to run a complex repeated function in r using lapply to create multiple datasets and perform multiple similar analyses. My base function works fine, essentially being the following:
models <- paste0("outcome", 1:10, " ~ explanatoryvariable | fe_variable |0|fe_variable") |> lapply(\(x) felm(as.formula(x), data = df))
I then plot models based on that formula:
plot1 <- plot_model(models[[1]], show.values = TRUE, value.offset = .3, vline.color = "Blue", vline.size = 0.1, p.threshold = 0.05, colors = "Black", digits = 3)
I do this for all plots 1-10. This allows me to perform the same regression analysis for multiple versions of explanatoryvariable, which are all included in my dataset with their corresponding number 1-10.
This works perfectly, however I now need to repeat this with a more complex formula. I need to add an interaction term to the orginal function. However, the variable being interacted is not the same between each model. Essentially, I need the following:
models <- paste0("outcome", 1:10, " ~ explanatoryvariable*interactionvariable(x) | fe_variable |0|fe_variable") |> lapply(\(x) felm(as.formula(x), data = df))
Where interactionvariable changes for each model.
So, when I create the plots
plot1 will include explanatoryvariable interacted with interactionvariable1 but plot2 will include explanatoryvariable interacted with interactionvariable2, and so on.
Because of the way my function with lapply was written before, I cannot insert this changing interactive variable into the pasted text. Is there a way to design an lapply function with these two changing inputs? How can I change the code to preserve the overall function for multiple models/plots, but adjust it for this more complex interaction term? Is it possible to have to sets of changing variables within the lapply function, especially given that they must be part of an interaction term?
Note: plot_model is from the sjPlot package and felm is from the lfe package
Update with attempts using map function
I tried using the map function and seem closer but am receiving errors:
input_fun<-function(data, input1, input2){
felm(as.formula(paste("outcome",input1," ~ explanatoryvariable*",input2, "| fe_variable |0|fe_variable", collapse = "", sep = "", data = data)))}
model <-map(.x = 1:2, .f = ~ input_fun(data= df, input1 = 1:10, input2 = c(interactionvariable1,interactionvariable2, ...)))
which returns the following error:
Error in str2lang(x) : <text>:69:396: unexpected symbol
I also tried:
input_fun<-function(data, input1, input2){
felm(as.formula(paste("outcome",input1," ~ explanatoryvariable*",input2, "| fe_variable |0|fe_variable |data = df", collapse = "", sep = "")))}
model <-map_chr(.x = 1:2, .f = ~ input_fun(data= df, input1 = 1:10, input2 = c(interactionvariable1,interactionvariable2, ...)))
and received:
Warning: invalid formula c("outcome1 ~ explanatoryvariable*interactionvariable1 | fe_variable |0|fe_variable | data = df", "outcome2 ~ explanatoryvariable*interactionvariable2 | fe_variable |0|fe_variable | data = df")...: assignment is deprecatedError in class(ff) <- "formula" : cannot set attribute on a symbol
As #allan suggested, map seems to be on the right track, but how can I fix these errors?

Strange glm() behavior in a function

Please help me understand the re-producible example below.
I am trying to write a function glm_func() that would call glm(). It works perfectly fine outside of a function.
However, if I pass the linear model formula as an argument, the function glm_func() gives out a strange error:
Error in eval(extras, data, env) : object 'modeldata' not found
Can someone help me understand what went wrong?
# Fully reproducable example
# Specify data
aa = data.frame(y=1:100, x1=1:100, x2=rep(1, 100), z=runif(100))
lm_formula = as.formula('y ~ x1 + x2')
weight_var = 'z'
# GLM works as-is outside of a function
model1 = glm(formula = lm_formula, data = aa, weights = aa[[weight_var]])
# Why does this function not work?
glm_func <- function(modeldata, formula, weight){
thismodel=glm(
formula = formula, #<----- Does not work if formula is passed from argument
data = modeldata, weights = modeldata[[weight]])}
glm_func(modeldata=aa, formula=lm_formula, weight=weight_var)
# This function works
glm_func2 <- function(modeldata, weight){
thismodel=glm(
formula = y ~ x1 + x2, #<----- Works if formula is hardcoded
data = modeldata, weights = modeldata[[weight]])}
glm_func2(modeldata=aa, weight=weight_var)
From help("formula"):
A formula object has an associated environment, and this environment
(rather than the parent environment) is used by model.frame to
evaluate variables that are not found in the supplied data argument.
Formulas created with the ~ operator use the environment in which they
were created. Formulas created with as.formula will use the env
argument for their environment.
From this one would expect that you don't need to care about the environment if you use the data argument. Sadly that's not the case here because the weights are evaluated within the formula's environment (Thanks to useruser2554330 for pointing this out!).
So, you need to ensure that your function environment is associated with the formula:
glm_func <- function(modeldata, formula, weight){
environment(formula) <- environment()
glm(formula = formula, data = modeldata,
weights = modeldata[[weight]])
}
glm_func(modeldata=aa, formula=lm_formula, weight=weight_var)
#works
Personally, I'd do this instead:
glm_func <- function(modeldata, formula, weight){
environment(formula) <- environment()
eval(
bquote(
glm(formula = .(formula), data = modeldata,
weights = modeldata[[weight]])
)
)
}
This way, the actual formula is printed when you print the model object.
As #Roland commented that a formula object has an associated environment so instead of passing a formula object you can pass the variables and create the formula inside the function.
glm_func <- function(modeldata, resp, predictor, weight){
glm(formula = reformulate(predictor, resp),
data = modeldata, weights = modeldata[[weight]])
}
glm_func(modeldata=aa, 'y', c('x1', 'x2'), weight=weight_var)

Error message when using lrm() and validate() from the rms package

I am trying to use validate() from the rms package but get an error. More specifically, I fit an ordinal logistic model using lrm() and then assess my results with validate().
Please find some code below:
library(data.table)
library(rms)
DT2 <- data.table(internal_model_rating_number = c(5,5,5,5,6,6,5,5,5,5),
ratio = c(2.0665194,1.2264998,1.0333628,0.6936382,-0.1883890,
-0.2349949,-0.5062086,-0.5204016,-0.4401635,-0.5824366))
lrm.model <- lrm(internal_model_rating_number ~ ratio,
x = T,
y = T,
data = DT,
maxit = 1000)
validate(lrm.model, group=internal_model_rating_number, B=200, bw=T)
Note that the group argument when running the code for my actual data (not displayed) is necessary, because some categories of the target variable occur very rarely.
The piece of validate() code returns the error:
Error in predab.resample(fit, method = method, fit = lrmfit, measure = discrim, :
object 'internal_model_rating_number' not found
Do you know how I can solve this error?
I think the data has to be specified before the grouping variable.
Try this validate(lrm.model, group = DT2$internal_model_rating_number, B=200, bw=T)
You also had a small typo (try data = DT2 instead of DT.)
If you don't want to specify the data again, you can use attach(DT2) and then run your model.
library(data.table)
library(rms)
DT2 <- data.table(internal_model_rating_number = c(5,5,5,5,6,6,5,5,5,5),
ratio = c(2.0665194,1.2264998,1.0333628,0.6936382,-0.1883890,
-0.2349949,-0.5062086,-0.5204016,-0.4401635,-0.5824366))
attach(DT2)
lrm.model <- lrm(internal_model_rating_number ~ ratio,
x = T,
y = T,
data = DT2,
maxit = 1000)
validate(lrm.model, group = internal_model_rating_number, B=200, bw=T)
Once you're done, if you'd like to detach the data, use detach(DT2)

Can't give a subset when using randomForest inside a function

I'm wanting to create a function that uses within it the randomForest function from the randomForest package. This takes the "subset" argument, which is a vector of row numbers of the data frame to use for training. However, if I use this argument when calling the randomForest function in another defined function, I get the error:
Error in eval(substitute(subset), data, env) :
object 'tr_subset' not found
Here is a reproducible example, where we attempt to train a random forest to classify a response "type" either "A" or "B", based on three numerical predictors:
library(randomForest)
# define a random data frame to train with
test.data = data.frame(
type = rep(NA, times = 500),
x = runif(500),
y = runif(500),
z = runif(500)
)
train.data$type[runif(500) >= 0.5] = "A"
train.data$type[is.na(test.data$type)] = "B"
train.data$type = as.factor(test.data$type)
# define the training range
training.range = sample(500)[1:300]
# formula to use
tr_form = formula(type ~ x + y + z)
# Function that includes the randomForest function
train_rf = function(form, all_data, tr_subset) {
p = randomForest(
formula = form,
data = all_data,
subset = tr_subset,
na.action = na.omit
)
return(p)
}
# test the new defined function
test_tree = train_rf(form = tr_form, all_data = train.data, tr_subset = training.range)
Running this gives the error:
Error in eval(substitute(subset), data, env) :
object 'tr_subset' not found
If, however, subset = tr_subset is removed from the randomForest function, and tr_subset is removed from the train_rf function, this code runs fine, however the whole data set is used for training!
It should be noted that using the subset argument in randomForest when not defined in another function works completely fine, and is the intended method for the function, as described in the vignette linked above.
I know in the mean time I could just define another training set that has just the row numbers required, and train using all of that, but is there a reason why my original code doesn't work please?
Thanks.
EDIT: I conjecture that, as subset() is a base R function, R is getting confused and thinking you're wanting to use the base R function rather than defining an argument of the randomForest function. I'm not an expert, though, so I may be wrong.

User-Defined Function for lme model fits: error

I am beginning to write a function that builds linear mixed models with nlme. I am encountering an error: Error in eval(expr, envir, enclos) : object 'value' not found, which I believe is due to R not knowing where to find the data frame variables (e.g., value). If this is, in fact, why the error is occurring, how do I tell the function that value and timepoint belong to the variables in Dat in the (reproducible) code below?
require(nlme)
Dat <- data.frame(
id = sample(10:19),
Time = sample(c("one", "two"), 10, replace = T),
Value = sample(1:10)
)
nlme_rct_lmm <- function (data, value, timepoint,
ID) {
#base_level intercept only model
bl_int_only <- gls(value ~ 1,
data = data,
method = "ML",
na.action="na.omit")
#vary intercept across participants
randomIntercept <- lme(value ~ 1,
data = data,
random = ~1|ID,
method = "ML",
na.action = "na.omit")
#add timepoint as a fixed effect
timeFE <- lme(value ~ timepoint,
data = data,
random = ~1|ID,
method = "ML",
na.action = "na.omit")
}
nlme_rct_lmm(Dat, Value, Time, id)
This isn't (as you and I both expected) a problem with evaluation within different frames; rather, it's an issue of consistency between the names of the variables between the formula and the data. R is case-sensitive, so it matters whether you use value or Value, id or ID, etc.. Furthermore, formula interpretation uses non-standard evaluation (NSE), so if you have a variable value equal to the symbol Value, value ~ 1 does not magically get transmuted to Value ~ 1. What I've outlined below works by passing the names of the response, time, and ID variables to the function, because it's the easiest approach. It's a little bit more elegant to the end-user if you use non-standard evaluation, but that's a bit harder to program (and therefore understand, debug, etc.).
Below the easy/boneheaded approach, I also discuss how to implement the NSE approach (scroll all the way down ...)
Note that your example doesn't return anything; with R, that means that all the results will be discarded when it finishes the function. You might want to return the results as a list (or perhaps your real function will do something other stuff with the fitted models, such as a series of model tests, and return those answers as the results ...)
require(nlme)
Dat <- data.frame(
ID = sample(10:19),
Time = sample(c("one", "two"), 10, replace = T),
Value = sample(1:10)
)
nlme_rct_lmm <- function (data, value, timepoint,
ID) {
nullmodel <- reformulate("1",response=value)
fullmodel <- reformulate(c("1",timepoint),response=value)
remodel <- reformulate(paste("1",ID,sep="|"))
#base_level intercept only model
bl_int_only <- gls(nullmodel,
data = data,
method = "ML",
na.action="na.omit")
#vary intercept across participants
randomIntercept <- lme(nullmodel,
data = data,
random = remodel,
method = "ML",
na.action = "na.omit")
#add timepoint as a fixed effect
timeFE <- lme(fullmodel,
data = data,
random = remodel,
method = "ML",
na.action = "na.omit")
}
nlme_rct_lmm(Dat, "Value", "Time", "ID")
If you want something a bit more elegant (but internally obscure) you can substitute the following lines for defining the models. The inner substitute() calls retrieves the symbols that were passed to the function as arguments; the outer substitute() calls insert those symbols into the formula.
nullmodel <- formula(substitute(v~1,list(v=substitute(value))))
fullmodel <- formula(substitute(v~t,list(v=substitute(value),
t=substitute(timepoint))))
remodel <- formula(substitute(~1|i,list(i=substitute(ID))))
Now this would work, without specifying the variables as strings, as you expected: nlme_rct_lmm(Dat, Value, Time, ID)

Resources