In this part of my Shiny app, I'll do a 'linear model' (lm()) regression, using the variables the user selects. There are three inputs:
input$lmTrendFun is a selectInput(), with the options c("Linear", "Exponential", "Logarithmic", "Quadratic", "Cubic"):
selectInput("lmTrendFun", "Select the model for your trend line.",
choices = c("Linear", "Exponential", "Logarithmic", "Quadratic", "Cubic"))
The second input is input$lmDep, and it's a selectInput() too. I created a updateSelectInput first inside an observe() reactive function, so the choices are the column names from the imported tibble.
The third input is input$lmInd and it's a checkboxGroupInput(), the choices being all the column names other than the one that's already the input$lmInd.
From that I want this output: the lm() (or rather, summary.lm() or summary(lm())) result for those variables. If I knew which they were, it would be simple:
if(input$lmTrendFun == "Linear"){
form <- yname ~ x1 + x2
}else if(input$lmTrendFun == "Exponential"){
form <- yname~ exp(x1) + exp(x2)
}else if(input$lmTrendFun == "Logarithmic"){
form <- yname~ log(x1) + log(x2)
}else if(input$lmTrendFun == "Quadratic"){
form <- yname ~ poly(x1, 2) + poly(x2, 2)
}else if(input$lmTrendFun == "Cubic"){
form <- y ~ poly(x1, 3) + poly(x2, 3)
}
[...]
lm(form, data = .)
where the data (.) has the columns yname, x1 and x2.
However, I don't. So I believe I need some more generic function that can create the formula. How can this be done?
formulizer <- function() as.formula(paste0( input$lmDep, "~", switch(input$lmTrendFun,
Linear = paste0(input$lmInd, collapse=" + "),
Logarithmic = paste0("exp(", input$lmInd,")", collapse=" + "),
Quadratic = paste0("poly(", input$lmInd,", 2)", collapse=" + "),
Cubic = paste0("poly(", input$lmInd,", 3)", collapse=" + ") )))
> input <- list(lmInd=paste0("V", 1:5), lmTrendFun="Linear", lmDep="Vp")
> formulaizer()
Vp ~ V1 + V2 + V3 + V4 + V5
<environment: 0x7fad1cf63d48>
> input <- list(lmInd=paste0("V", 1:5), lmTrendFun="Logarithmic", lmDep="Vp")
> formulizer()
Vp ~ exp(V1) + exp(V2) + exp(V3) + exp(V4) + exp(V5)
<environment: 0x7fad01e694d0>
> input <- list(lmInd=paste0("V", 1:5), lmTrendFun="Quadratic", lmDep="Vp")
> formulizer()
Vp ~ poly(V1, 2) + poly(V2, 2) + poly(V3, 2) + poly(V4, 2) +
poly(V5, 2)
<environment: 0x7fad01f51d20>
> input <- list(lmInd=paste0("V", 1:5), lmTrendFun="Cubic", lmDep="Vp")
> formulizer()
Vp ~ poly(V1, 3) + poly(V2, 3) + poly(V3, 3) + poly(V4, 3) +
poly(V5, 3)
<environment: 0x7fad01f59690>
Consider switch with vectorized paste0 to build terms with transformations and then pass terms into reformulate. Adjust below inputs to actual Shiny variables:
dep_term <- ...
ind_terms <- ...
form <- switch(input$lmTrendFun,
Linear = reformulate(ind_terms, response="yname"),
Exponential = reformulate(paste0("exp(", ind_terms, ")"), response=dep_term),
Logarithmic = reformulate(paste0("log(", ind_terms, ")"), response=dep_term),
Quadratic = reformulate(paste0("poly(", ind_terms, ", 2)"), response=dep_term),
Cubic = reformulate(paste0("poly(", ind_terms, ", 3)"), response=dep_term)
)
Online Demo
Related
I have multiple dataframes and I would like to evaluate (multiple) different models on each. MWE
df1 <- data.frame(A3 = c(-5, 5, 1),
B3 = c(0, 10, 1))
df2 <- data.frame(A4 = c(5, 15, 1))
B4 = c(10, 20, 1))
myfun <- function(arg1, arg2){ # arg1 =1 or 2
if (arg2 == 1){
eqn <- paste0("A", arg1+2) ~ paste0("B", arg1+2) + I(as.name(paste0("B", arg1+2))^2)
} else {
eqn <- paste0("A", arg1+2) ~ paste0("B", arg1+2) + I(as.name(paste0("B", arg1+2))^2) +I(as.name(paste0("B", arg1+2))^3)
}
return (lm(formula = eqn, data = eval(as.name(paste0("df", arg1)))
)
)
}
For example if I run myfun(1,2) I would like to get lm(A4 ~ B4 + I(B4^2) + I(B4^3), data = df2). But whatever I try I get the following error message Error in (paste0("B", arg1 +2))^2 : non-numeric argument to binary operator. From what I read in ?I, I imagine this is because R isolates whatever is passed into I(), so it doesn't realize I am trying to transform a variable: is that what is going on, and is it something I can fix? Also, is there a better way to estimate multiple models quickly? All the similar questions I found used the same data.frame across models, while I have to account for the response (and predictor) variables coming from different dataframes for different models.
Maybe this is what you are looking for:
The issue is that your are doing a math operation on a string, i.e with (paste0("B", arg1 +2))^2 you try to square a string, that's why you get the error. Inytead you can simply glue you formula together as a string an d converted it to a formula via as.formula:
df1 <- data.frame(A3 = c(-5, 5, 1),
B3 = c(0, 10, 1))
df2 <- data.frame(A4 = c(5, 15, 1))
B4 = c(10, 20, 1)
myfun <- function(arg1, arg2){ # arg1 =1 or 2
if (arg2 == 1){
eqn <- paste0("A", arg1+2, " ~ B", arg1+2," + I(B", arg1+2, "^2)")
} else {
eqn <- paste0("A", arg1+2, " ~ B", arg1+2," + I(B", arg1+2, "^2) + I(B", arg1+2, "^3)")
}
return (lm(formula = as.formula(eqn), data = eval(as.name(paste0("df", arg1)))
)
)
}
myfun(2, 1)
#>
#> Call:
#> lm(formula = as.formula(eqn), data = eval(as.name(paste0("df",
#> arg1))))
#>
#> Coefficients:
#> (Intercept) B4 I(B4^2)
#> 0.84795 0.12281 0.02924
An option is also to construct the formula with glue
myfun <- function(arg1, arg2){
eqn <- switch(arg2,
`1` = glue::glue("A{arg1 + 2}~ B{arg1+2} + I(B{arg1+2}^2)"),
glue::glue("A{arg1 + 2}~ B{arg1+2}",
"+ I(B{arg1+2}^2) + I(B{arg1+2}^3)")
)
model <- lm(eqn, data = get(paste0('df', arg1), envir = .GlobalEnv))
model$call <- as.formula(eqn)
return(model)
}
myfun(2, 1)
#Call:
#A4 ~ B4 + I(B4^2)
#Coefficients:
#(Intercept) B4 I(B4^2)
# 0.84795 0.12281 0.02924
I have a problem with step forward regression and My understanding is that i don't pass argument Data correctly.
I have the function:
ForwardStep <- function(df,yName, Xs, XsMin) {
Data <- df[, c(yName,Xs)]
fit <- glm(formula = paste(yName, " ~ ", paste0(XsMin, collapse = " + ")),
data = Data, family = binomial(link = "logit") )
ScopeFormula <- list(lower = paste(yName, " ~ ", paste0(XsMin, collapse = " + ")),
upper = paste(yName, " ~ ", paste0(Xs, collapse = " + ")))
result <- step(fit, direction = "forward", scope = ScopeFormula, trace = 1 )
return(result)
}
When I try to run it with following arguments
df <- data.frame(Y= rep(c(0,1),25),time = rpois(50,2), x1 = rnorm(50, 0,1),
x2 = rnorm(50,.5,2), x3 = rnorm(50,0,1))
yName = "Y"
Xs <- c("x1","x2","x3")
XsMin <- 1
res <- ForwardStep(df,Yname,Xs,XsMin)
I am getting an Error:
Error in is.data.frame(data) : object 'Data' not found
But if I first define Data in Global Env it works perfectly fine.
Data <- df[, c(yName,Xs)]
res <- ForwardStep(df,Yname,Xs,XsMin)
I guess that I have wrong implementation of function step however I don't exactly know how to do it the right way.
You need to realize that formulas always have an associated environment, see help("formula"). One should never pass text to the formula parameter of model functions, never ever. If you do that, you will encounter scoping issues sooner or later. Usually, I'd recommend computing on the language instead, but you can also create the formulas from text in the correct scope:
ForwardStep <- function(df,Yname, Xs, XsMin) {
Data <- df[, c(Yname,Xs)]
f1 <- as.formula(paste(Yname, " ~ ", paste0(XsMin, collapse = " + ")))
fit <- glm(formula = f1,
data = Data, family = binomial(link = "logit") )
f2 <- as.formula(paste(Yname, " ~ ", paste0(XsMin, collapse = " + ")))
f3 <- as.formula(paste(Yname, " ~ ", paste0(Xs, collapse = " + ")))
ScopeFormula <- list(lower = f2,
upper = f3)
step(fit, direction = "forward", scope = ScopeFormula, trace = 1)
}
df <- data.frame(Y= rep(c(0,1),25),time = rpois(50,2), x1 = rnorm(50, 0,1),
x2 = rnorm(50,.5,2), x3 = rnorm(50,0,1))
YName = "Y"
Xs <- c("x1","x2","x3")
XsMin <- 1
res <- ForwardStep(df,YName,Xs,XsMin)
#Start: AIC=71.31
#Y ~ 1
#
# Df Deviance AIC
#<none> 69.315 71.315
#+ x1 1 68.661 72.661
#+ x3 1 68.797 72.797
#+ x2 1 69.277 73.277
(Public service announcement: step-wise regression is a garbage generator. There are better statistical techniques available.)
I try to pass the repsonse variable tv as a function argument into lm within an expression. I hope the code below makes it clearer what I try to achieve.
I preferrably would like to do that using tidy evaluation.
Furthermore, I tried to replace expression from base R with tidyeval terminology but I did not succeed to do so.
library(tidyverse)
library(mice)
data <- boys[boys$age >= 8, -4]
imp <- mice(data, seed = 28382, m = 10, print = FALSE)
choose_vars <- function(predictor_vars) {
predictors <- my_vars %>%
str_c(collapse = " + ") %>%
str_c("~", .) %>%
rlang::parse_expr(.)
scope <- list(upper = predictors, lower = ~1)
my_expression <- expression(
f1 <- lm(tv ~ 1),
f2 <- step(f1, scope = scope))
fit <- with(imp, my_expression)
formulas <- lapply(fit$analyses, formula)
terms <- lapply(formulas, terms)
votes <- unlist(lapply(terms, labels))
table(votes)
}
my_vars <- c("age", "hgt", "wgt", "hc", "gen", "phb", "reg")
choose_vars(predictor_vars = my_vars)
I would like to be able to pass tv via my own function.
choose_vars(predictor_vars = my_vars, response_var = tv)
The original code derives from Stef van Buuren's book Flexible Imputation of Missing Data.
data <- boys[boys$age >= 8, -4]
imp <- mice(data, seed = 28382, m = 10, print = FALSE)
scope <- list(upper = ~ age + hgt + wgt + hc + gen + phb + reg,
lower = ~1)
expr <- expression(f1 <- lm(tv ~ 1),
f2 <- step(f1, scope = scope))
fit <- with(imp, expr)
formulas <- lapply(fit$analyses, formula)
terms <- lapply(formulas, terms)
votes <- unlist(lapply(terms, labels))
table(votes)
Not exactly what I wanted but I found a way to pass the response variable into the function. The result is the same as in the example from the book.
library(tidyverse)
library(mice)
data <- boys[boys$age >= 8, -4]
imp <- mice(data, seed = 28382, m = 10, print = FALSE)
My code
choose_vars <- function(imp_data, predictor_vars, response_var) {
predictors <- predictor_vars %>%
str_c(collapse = " + ") %>%
str_c("~", .) %>%
rlang::parse_expr(.)
scope <- list(upper = predictors, lower = ~1)
form <- str_c(response_var, " ~ 1")
fit <- imp_data %>%
mice::complete("all") %>%
lapply(function(x) { step(lm(formula = as.formula(form), data = x), scope = scope) } )
formulas <- lapply(fit, formula)
terms <- lapply(formulas, terms)
votes <- unlist(lapply(terms, labels))
table(votes)
}
my_vars <- c("age", "hgt", "wgt", "hc", "gen", "phb", "reg")
my_table <- choose_vars(imp_data = imp, predictor_vars = my_vars, response_var = "tv")
Book example
scope <- list(upper = ~ age + hgt + wgt + hc + gen + phb + reg,
lower = ~1)
expr <- expression(f1 <- lm(tv ~ 1),
f2 <- step(f1, scope = scope))
fit <- with(imp, expr)
formulas <- lapply(fit$analyses, formula)
terms <- lapply(formulas, terms)
votes <- unlist(lapply(terms, labels))
stefs_table <- table(votes)
Compare results
identical(my_table, stefs_table)
[1] TRUE
I am using the following function to make formula's, where I can simply assign vector of variable names, where the function makes sure everything is in the right place and double variable names are excluded:
formula <- function(depvar, indepvars, instruments=NULL, othervars=NULL) {
x <- c(indepvars, instruments, othervars)
totvars <- unique(x)
totvars <- x[!x %in% depvar]
formula <- as.formula(
paste(depvar, paste(totvars, collapse = " + "), sep = " ~ "))
return(formula)
}
indepvars <- c("indepvarA", "indepvarB", "indepvarC")
instruments <- c("IV_A", "IV_B")
# lm
formula("depvar", indepvars)
# 1st stage - IV's for indepvarC
formula("indepvarC", indepvars, instruments)
However, I want the option to write a more complicated formula (an ivreg formula), namely:
depvar ~ instrumentedvar + indepvars | instrumentvars + indepvars
I have been trying the following:
formula <- function(depvar, indepvars, instruments=NULL, instrumentedvar=NULL, othervars=NULL, twostage=NULL) {
x <- c(indepvars, instruments, othervars)
totvars <- unique(x)
totvars <- x[!x %in% depvar]
if (is.null(twostage)) {
formula <- as.formula(
paste(depvar, paste(totvars, collapse = " + "), sep = " ~ "))
} else {
totvarsB <- totvars[!totvars %in% instrumentedvar]
totvarsB <- c(as.character(totvarsB), as.character(instruments))
formula <- as.formula(
paste(depvar, paste(paste(totvars, collapse = " + "), paste("|", paste(totvarsB, collapse = " + " )), sep = " ~ ")))
}
return(formula)
}
indepvars <- c("indepvarA", "indepvarB", "indepvarC")
instruments <- c("IV_A", "IV_B")
instrumentedvar <- "indepvarC"
formula("indepvarC", indepvars, instruments, twostage=1)
But I cannot seem to get it right.
Define reform which takes a vector of names and outputs a string in which they are connected with plus signs. Then use sprintf to generate the final string and convert that using as.formula:
reform <- function(x) paste(x, collapse = " + ")
makeFo <- function(lhs, rhs1, rhs2 = NULL, env = parent.frame()) {
s <- sprintf("%s ~ %s", lhs, reform(c(rhs1, rhs2)))
if (!missing(rhs2)) s <- sprintf("%s | %s", s, reform(rhs2))
as.formula(s, env = env)
}
# test
makeFo("y", c("x1", "x2"))
## y ~ x1 + x2
makeFo("y", c("x1", "x2"), c("u1", "u2"))
## y ~ x1 + x2 + u1 + u2 | u1 + u2
When I calculate a linear model in R via the lm() function, it is possible to pass a character vector of variables into the lm() formula. (E.g. like described here or here.) However, if I apply the same method to the selection() function of the sampleSelection package, it appears the following error:
Error in detectModelType(selection, outcome) :
argument 'selection' must be a formula in function 'selection()'
Question: Is there a way to pass a character vector of variables into the selection() formula?
Below, you can find a reproducible example, which illustrates the problem:
# Example data
N <- 1000
y <- rnorm(N, 2000, 200)
y_prob <- c(rep(0, N / 2), rep(1, N / 2)) == 1
x1 <- y + rnorm(N, 0, 300)
x2 <- y + rnorm(N, 0, 300)
x3 <- y + rnorm(N, 0, 300)
x4 <- y + rnorm(N, 0, 300)
x5 <- y + rnorm(N, 0, 300)
y[1:(N / 2)] <- 0
data <- data.frame(y, x1, x2, x3, x4, x5, y_prob)
x_vars <- colnames(data)[colnames(data) %in% c("y", "y_prob") == FALSE]
# Estimate linear model via lm() --> works without any problems
lm(paste("y", "~", paste(x_vars, collapse = " + ")))
# Estimate Heckman model via selection()
library("sampleSelection")
# Passing of vector does not work
selection(paste("y_prob", "~", paste(x_vars[1:4], collapse = " + ")),
paste("y", "~", paste(x_vars[3:5], collapse = " + ")), data)
# Formula has to be written manually
selection(y_prob ~ x1 + x2 + x3 + x4, y ~ x3 + x4 + x5, data)
Wrap your paste calls with as.formula
selection(as.formula(paste("y_prob", "~", paste(x_vars[1:4], collapse = " + "))),
as.formula(paste("y", "~", paste(x_vars[3:5], collapse = " + "))), data)
Call:
selection(selection = as.formula(paste("y_prob", "~", paste(x_vars[1:4], collapse = " + "))), outcome = as.formula(paste("y", "~", paste(x_vars[3:5], collapse = " + "))), data = data)
Coefficients:
S:(Intercept) S:x1 S:x2 S:x3 S:x4 O:(Intercept) O:x3 O:x4 O:x5 sigma
-1.936e-01 -5.851e-05 7.020e-05 5.475e-05 2.811e-05 2.905e+02 2.286e-01 2.437e-01 2.165e-01 4.083e+02
rho
1.000e+00