I am working on developing a function, with the effects package in R. I am constantly running into a problem, which I can't debug.
I run the following code:
rm(list = ls()) # clear working directory
library(effects)
head(mtcars)
mod <- lm(mpg ~ gear*cyl + gear + cyl + carb, data=mtcars)
summary(mod)
eff.dat <- effect("gear*cyl", mod=mod, KR=TRUE, xlevels=list(gear=seq(3,5,1)))
eff.dat <- as.data.frame(eff.dat)
View(eff.dat)
It works like a charm, I get the effects for the interaction term on mpg when gear equals 3, 4, and 5 and the corresponding values of cyl.
However, once I put this into a function like:
proba <- function(term, model, main) {
eff.dat2 <<- effect(term, mod=model, KR=TRUE,
xlevels=list(main=seq(min(mtcars[[main]]),
max(mtcars[[main]]), 1)))
eff.dat2 <<- as.data.frame(eff.dat2)
}
proba("gear*cyl", model=mod, main="gear")
View(eff.dat2)
The xlevels part fails and the interaction term is estimated for the default values of gear, not the ones I specify. Obviously, this is part of a larger function, otherwise I would not bother to write something solely for effect.
First an illustration what happens:
foo <- function(x) {
list(x = x)
}
foo("bar")
#$x
#[1] "bar
Note how the list element is named x. setNames can be used to set names programmatically:
foo <- function(x) {
setNames(list(x), x)
}
foo("bar")
#$bar
#[1] "bar"
Also, you should avoid creating function side effects with <<-. It's very bad practice. Create a proper return value instead:
proba <- function(term, model, main) {
as.data.frame(
effect(term, mod=model, KR=TRUE,
xlevels= setNames(list(seq(min(mtcars[[main]]),
max(mtcars[[main]]), 1)), main))
)
}
eff.dat2 <- proba("gear*cyl", model=mod, main="gear")
all.equal(eff.dat, eff.dat2)
#[1] TRUE
Related
I have a custom function in r that involves writing and extracting terms from a model. For example:
myfunction <- function(data, outcome1, predictor1, predictor2){
modelx <- glm(reformulate(predictor1, outcome1), data = data)
modelx$aic
}
myfunction(data = mtcars, outcome1 = "mpg", predictor1 = "cyl")
In reality my function returns a dataframe, but I am making it simpler for the purposes of this question.
I would like to add a message to the function that automatically prints out the regression formula being called when I run my function. Outside of a function, I can print this information by running:
> reformulate("cyl", "mpg")
mpg ~ cyl
Therefore, I think I should be able to have my function print the message by adding something like this to the end of the function:
simpleMessage(reformulate(predictor1, outcome1))
However, I can't get this to work, and I understand that it has something to do with the fact that a function can only return one output value. How can I get this message to work?
Close. You can use message() directly, which actually does have a call to simpleMessage() within it, but chances are you don't have to actually worry about constructing a message.
You can pass the formula into message() (with the help of format() to print it out and still have your model output.
Below has this implemented as well as a quick demonstration for why format() is helpful with the formula class.
myfunction <- function(data, outcome1, predictor1, predictor2){
ref <- reformulate(predictor1, outcome1) # save as object to use twice
modelx <- glm(ref, data = data)
message(format(ref)) # message() with format()
modelx$aic
}
myfunction(data = mtcars, outcome1 = "mpg", predictor1 = "cyl")
#> mpg ~ cyl
#> [1] 169.3064
# wrap with suppressMessages() to silence
suppressMessages(myfunction(data = mtcars, outcome1 = "mpg", predictor1 = "cyl"))
#> [1] 169.3064
# Why format?
ref <- reformulate("a", "b")
print(ref)
#> b ~ a
as.character(ref)
#> [1] "~" "b" "a"
format(ref)
#> [1] "b ~ a"
Created on 2022-10-06 with reprex v2.0.2
How can you create a function that calls some predefined functions simultaneously?
E.g. I have 3 different functions like
myplot(data)
fmodel(data)
mymodel <- fmodel(data)
myconclusion(model = mymodel)
Now I want to create a new function that calls those predefined functions (from 1 to 3). What should I do?
I tried to do something like the below and receive the following error message, but I don't what was wrong.
P/s: my model involves linear regression and I've already put in the 'data' arguments.
myplot(mydata)
fmodel(mydata)
myconclusion(mymodel)
funlist <- list(
F1 = myplot
F2 = fmodel
mymodel <- fmodel
F3 = myconclusion
)
callfun <- function(funrange, data, ...){
for(i in funrange){
funlist[[i]](...)
}
}
callfun(1:3, data = mydata)
#Error in model.frame.default(formula = Y ~ X, data = mydata, drop.unused.levels = TRUE) :
#argument "data" is missing, with no default
Running the 3 functions inside another function should execute them, However, depending on what the functions actually do, there may not be any visible output.
f1 <- function(mydata, mymodel){
myplot(mydata)
fmodel(mydata)
myconclusion(mymodel)
}
f1(mydata, mymodel)
Again, depending on what these functions actually do will dictate the output.
EDIT
Here is an example for you
my_plot <- function(my_data){
my_data %>%
ggplot(aes(mpg, hp))+
geom_point()
}
my_model <- function(my_data){
my_data %>%
lm(mpg ~ hp, data = .) %>%
summary
}
my_model_2 <- function(my_data){
my_data %>%
lm(mpg ~ disp, data = .) %>%
summary
}
f1 <- function(my_data){
my_plot(my_data)
my_model(my_data)
my_model_2(my_data)
}
If you call f1(mtcars), all you will see is the output from my_model_2(), because that was the last function to be executed. my_plot() and my_model() were still executed, but you just couldn't see the results because all it does is preview a plot in the viewer, or print the model summary to the console.
One way to 'see' the plot produced by my_plot() is to change what it does, from previewing a plot in the viewer, to saving a copy of the plot. This may be done like this:
my_plot <- function(my_data){
my_data %>%
ggplot(aes(mpg, hp))+
geom_point()
ggsave('my_saved_plot.png')
}
Or, wrapping each function inside print will print the model summaries to the console, and show the plot in the viewer
f1 <- function(my_data){
print(my_plot(my_data))
print(my_model(my_data))
print(my_model_2(my_data))
}
I have a function that takes a dataset, extracts different variables, and then makes linear models from those variables (it expects the response in the last column). I want the data argument of the calls for these models to use objects in the global environment so that I can manipulate them with other functions outside this function. The following gives the expected behavior when provided with a single dataset.
make_mods <- function(dataset) {
make_mod <- function(x){
response <- names(dataset)[length(dataset)]
form <- paste0(response, " ~ ", x)
form <- as.formula(form)
bquote( lm(.(form), data = .(d_sub)) ) # Unevaluated to show output
}
d_sub <- substitute(dataset)
vars <- names(dataset)[-length(dataset)]
mods <- lapply(vars, make_mod)
return(mods)
}
# Make some different datasets
ex1 <- ex2 <- ex3 <- mtcars[c(3,4,6,1)]
new_data <- function(x) {
x + rnorm(length(x), mean = 0, sd = sd(x))
}
ex2[-length(ex2)] <- lapply(ex2[-length(ex2)], new_data)
ex3[-length(ex3)] <- lapply(ex3[-length(ex3)], new_data)
make_mods(ex1)
I also want to be able to use this function within lapply
# List of datasets for testing function with lapply
ex_l <- mget(c("ex1", "ex2", "ex3"))
lapply(ex_l, make_mods)
But here the model calls end up looking like this: lm(mpg ~ disp, data = X[[i]]) and, of course, this model call doesn't evaluate in the default environment (the actual function evaluates the model call in the function). The desired output is a list of lists of models that look like: lm(mpg ~ disp, data = ex_l[["ex1"]]), i.e., they have valid calls with data arguments that reference data frames in the global environment.
I've experimented with passing names to lapply and different wrapper functions for calling make_mods from lapply but it seems like my function, in using substitute only gives the expected behavior when called from the global environment. I'm new to working with scoping and environments. How can I get my function to give the desired lm call both when passed a data frame from the global environment, and when passed data frames from within lapply.
The only thing that I could think of was to add an if statement to my make mods function that tests if the input is a call or not. If it's a call, it expects it to be a call for a dataset in the global environment.
make_mods <- function(dataset) {
make_mod <- function(x){
response <- names(dataset)[length(dataset)]
form <- paste0(response, " ~ ", x)
form <- as.formula(form)
bquote( lm(.(form), data = .(d_sub)) )
}
if(is.call(dataset)) {
d_sub <- dataset
dataset <- eval(dataset)
} else {
d_sub <- substitute(dataset)
}
vars <- names(dataset)[-length(dataset)]
mods <- lapply(vars, make_mod)
return(mods)
}
Then I can use lapply like this:
out <- lapply(names(ex_l), function(x){
g <- bquote(ex_l[[.(x)]])
make_mods(g)
})
names(out) <- names(ex_l)
which gives me this:
$ex1
$ex1[[1]]
lm(mpg ~ disp, data = ex_l[["ex1"]])
$ex1[[2]]
lm(mpg ~ hp, data = ex_l[["ex1"]])
$ex1[[3]]
lm(mpg ~ wt, data = ex_l[["ex1"]])
<<output truncated>>
Maybe not an elegant solution, but it's working.
I have an R function programmed to stop when input is not a call from "aov()" or "lm()".
Below, I expect when using fit3 as input, my function to stop, but I'm wondering why it does not?
P.S. The function correctly stops when fed fit4, BUT doesn't stop when fed fit3; WHY?
fit2 <- aov(mpg ~ wt, data = mtcars)
library(rstanarm)
fit3 <- stan_glm(mpg ~ wt, data = mtcars) # This call is from "rstanarm" package !!
fit4 <- glm(vs~mpg, data = mtcars)
bb <- function(fit = NA){
if(!(any(is.na(fit)))){
if(fit$call[1] != "lm()" && fit$call[1] != "aov()") stop("Error") else "OK"
}
}
# Examples of use:
bb(fit = fit4) # stops as expected ! because call is not from "lm()" or "aov()"
bb(fit = fit3) # I expect HERE to stop also; why it doesn't? !!!!
This is the reason:
> !(any(is.na(fit3)))
[1] FALSE
is.na(fits3) returns a logical named vector. Maybe you tried to check if is.na(fit3$call), instead of all elements of the fit3 object?
I'm using an R function which requires a list of variables as input arguments in the following format:
output <- funName(gender ~ height + weight + varName4, data=tableName)
Basically the input arguments are column names in the table (and are not to be enclosed in ""). I have a list of these variables that I want to add one by one; i.e. run the function with one variable first, get the output, and incrementally adding variables (getting an output each time) i.e.
iteration 1:
output <- funName(gender ~ height, data=tableName)
iteration 2:
output <- funName(gender ~ height + weight, data=tableName)
iteration 3:
output <- funName(gender ~ height + weight + varName4, data=tableName)
Is this possible?
Try the following:
# vector of variable names
myNames <- c("gender", "height", "weight", "varName4")
# print out results
for(i in 2:4) {
print(as.formula(paste(myNames[1], "~", paste(myNames[2:i], collapse="+"))))
}
Of course, you can replace print with the appropriate funName, such as lm, along with additional arguments. So
for(i in 2:4) {
lm(as.formula(paste(myNames[1], "~", paste(myNames[2:i], collapse="+"))), data=tableName)
}
Should work as you would expect it to. You could also use lapply if you wanted to save the results in an orderly fashion:
temp <- lapply(2:4, function(i) as.formula(paste(myNames[1], "~",
paste(myNames[2:i], collapse="+"))))
will save a list of formulas, for example.
Using the reformulate function as mentioned by #ben-bolker, you can simplify the web of paste functions:
for(i in 2:4) {
print(reformulate(myNames[2:i], response = myNames[1], intercept = TRUE))
}