R saving function call with formula for reuse in bootstrapping - r

I've got some code that creates an object from a formula and saves the call for future use, as so:
create_obj <- function(formula, data) {
obj <- list()
# Populate obj with whatever we're interested in
# ...
# Save call for future use
obj$call <- match.call()
obj
}
obj <- create_obj(time ~ sex, data)
If I then bootstrap data I can easily build the model on the new dataset:
data <- data[sample(1:nrow(data), replace=T), ]
new_obj <- eval(obj$call)
However, if I have the formula saved in a variable and I pass the object into a new environment this won't work:
do_stuff <- function(object, newdata) {
data <- newdata[sample(1:nrow(newdata), replace=T), ]
new_object <- eval(object$call)
}
main <- function() {
my_form <- time ~ sex
obj2 <- create_obj(my_form, data)
# obj2$call: 'create_obj(formula = my_form, data = data)'
do_stuff(obj2, data)
}
Error: object my_form not found.
How can I have it so that obj$call saves time~sex rather than myform? Otherwise I need to pass the formula itself around rather than just the object, limiting the practicality.
The above example isn't reproducible but you can see the same thing with a standard lm call.
EDIT: I've solved the problem, see the accepted answer.

I've solved it by having the constructor function modify the call by evaluating the constant argument in the local environment:
create_obj <- function(formula, data) {
obj <- list()
# Populate obj with whatever we're interested in
# ...
# Save call for future use
func_call <- match.call()
func_call$formula <- eval(formula)
# obj_call is now: create_obj(formula=time~sex, data=data)
obj$call <- func_call
obj
}

Related

Store results of function in complex object (?) in R

everbody knows this functions:
fitted_lm = lm(mpg ~ wt, mtcars)
> class(fitted_lm)
[1] "lm"
fitted_lm$
In my global environment I see that fitted_lm is stored as a list, but when i call the class function on the object, I get "lm" as result. These object allows it to easily access different values with the "$"-sign. How can I store my own results that kind of way?
For example something like this:
complex_output <- function(x) {
row.means <- rowMeans(x)
col.means <- colMeans(x)
result <- list(row.means, col.means)
return(result)
}
complex_result <- complex_output(x = mtcars)
complex_result[[1]]
complex_result$ # does not work
I can of course access all results via double brackets, but it would be much more convenient to access them with the "$"-sign. How can I do that?
Try assigning names to the elements in the list generated by your function:
complex_output <- function(x) {
row.means <- rowMeans(x)
col.means <- colMeans(x)
result <- list(r=row.means, c=col.means)
return(result)
}
complex_result$r

How do I write an adverb function that returns a function evaluated in a different environement?

This is closely related to the question: How do I pass ``...`` to a new environment in R?
And the thread here.
My ultimate goal is to be able to have a function that:
operates on a function and returns a function
The return function creates a new environment with .GlobalEnv as its parent
it evaluates the argument function inside the new environment.
And it solves the save-size problem described below.
The save size problem is the problem that lm (and other, for example ggplot) objects save their calling environments, which can often contain irrelevant information. The goal is to have a convenient wrapper that fixes this problem. To demonstrate:
saveSize <- function (object) {
tf <- tempfile(fileext = ".RData")
on.exit(unlink(tf))
save(object, file = tf)
file.size(tf)
}
tmp_fun <- function(){
iris_big <- lapply(1:10000, function(x) iris)
lm(Sepal.Length ~ Sepal.Width, data = iris)
}
out <- tmp_fun()
object.size(out)
# 48008
saveSize(out)
# 1002448 - Far too large as it contains iris_big.
Bill Dunlap proposes this solution which works:
tmp_fun_Bill <- function(){
iris_big <- lapply(1:10000, function(x) iris)
env <- new.env(parent = globalenv())
with(env, lm(Sepal.Length ~ Sepal.Width, data = iris))
}
out <- tmp_fun_Bill()
object.size(out)
# 48008
saveSize(out)
# 4478 - this works!
I would like to generalize Bill's approach into one of these functions that return functions (like purrr::safely).
My best attempt, with help from #MrFlick:
in_new_env <- function(.f){
function(...) {
params <- list(...)
env <- new.env(parent = globalenv())
# Change the environment of any formula objects
params <- lapply(params, function(x) {if (inherits("x","formula")) {environment(x)<-env}; x})
assign(".params.", params, envir = env)
env$.f <- .f
evalq(do.call(".f", .params.), envir=env)
}
}
tmp_fun_me <- function(){
iris_big <- lapply(1:10000, function(x) iris)
in_new_env(lm)(Sepal.Length ~ Sepal.Width, data = iris)
}
out <- tmp_fun_me()
object.size(out)
# 48008
saveSize(out)
# 1002448 - too big again
Can someone point out what's going wrong here?
The problem is really the that the formula grabs the current environment. Here's a function that will set the environment for the formula to an empty environment and
dropenv <- function(x) {
env <- new.env(parent = globalenv())
if (inherits(x,"formula")) {
environment(x)<-env
}
x
}
tmp_fun_drop <- function(){
iris_big <- lapply(1:10000, function(x) iris)
lm(dropenv(Sepal.Length ~ Sepal.Width), data = iris)
}
But this requires evaulating and parsing the formula argument to the lm() function. Here's a possible workaround for your desired method calling
in_new_env <- function(.f){
function(formula, ...) {
formula <- dropenv(formula)
.f(formula, ...)
}
}
tmp_fun_drop <- function(){
iris_big <- lapply(1:10000, function(x) iris)
in_new_env(lm)(Sepal.Length ~ Sepal.Width, data = iris)
}
Now the function returned by in_new_env assumes that the first parameter will be a formula and will clear out the environment for that formula.

Appending functions results within a for loop

Not a reproducible example, but here is my problem. I have a data frame and am looping through a series of columns and performing forecasting. I've created a fuction that will handle the forecasting part as it's repetitive and place it within the for loop. So each iteration of the loop, the forecast_func loop will be used to perform the forecast and accuracy assessment, and rbind those results to the results data frame.
result = data.frame()
forecast_func = function(data){
data.train = rnorm(100)
data.test = rnorm(10)
library(forecast)
mod = auto.arima(data.train)
mod_forc = forecast(mod)$mean
mod_acc = accuracy(mod_forc, data.test)
result = rbind(result, data.frame(mod_forc))
}
result
for(i in ...){
...
forecast_func(data)
...
}
If I were doing this once, I know how to append the results to result. However, within a for loop, every time I use the function, I get an empty data frame. Obviously, it's not returning to the global environment. However, I was wondering what were some useful strategies for attaching this problem.
I believe you can use the parent assignment operator here (<<-):
forecast_func = function(data){
data.train = rnorm(100)
data.test = rnorm(10)
library(forecast)
mod = auto.arima(data.train)
mod_forc = forecast(mod)$mean
mod_acc = accuracy(mod_forc, data.test)
# assign to 'result' data frame in the parent (calling) scope
result <<- rbind(result, data.frame(mod_forc))
}
result = data.frame()
for (i in ...) {
...
forecast_func(data)
...
}
This being said, using the parent assignment operator is usually frowned upon, based on what I have seen here on SO and elsewhere.
One way to avoid having use <<- would be to refactor your forecast_func to return a data frame, instead of trying to persist the result internally:
forecast_func = function(data){
data.train = rnorm(100)
data.test = rnorm(10)
library(forecast)
mod = auto.arima(data.train)
mod_forc = forecast(mod)$mean
mod_acc = accuracy(mod_forc, data.test)
return(data.frame(mod_forc))
}
result = data.frame()
for (i in ...) {
...
# just rbind() here in the calling scope
result = rbind(result, forecast_func(data))
...
}

How to stop reference a data frame in a function

I want to build a function in such a way that once i supplied data='name of data frame' there is no need to write variable=data$variable as just writing variable name from the supplied data frame will serve the purpose
myfunction<-function(variable,data)
{
result=sum(data)/sum(variable)
return(result)
}
for example i have a data frame df
df<-data.frame(x=1:5,y=2:6,z=3:7,u=4:8)
I want to provide following input
myfunction(variable=x,data=df)
instead of below input to serve the purpose
myfunction(variable=df$x,data=df)
We can use non-standard evaluation:
myfunction <- function(variable, data) {
var <- eval(substitute(variable), data)
result = sum(data)/sum(var)
return(result)
}
# Test
myfunction(variable = x, data = df)
#[1] 6
The with or attach functions can help you here, see the ?with and ?attach documentation. Alternatively, you can supply the variable name as a character and use this in the function body. I.e. you can do something like this:
myfunction2 <- function(variable, data) {
result <- sum(data)/sum(data[[variable]])
return(result)
}
df <- data.frame(x=1:5,y=2:6,z=3:7,u=4:8)
myfunction2("x", df)
#[1] 6
Yet another resort is to use non-standard evaluation. A small example of this is something like:
myfunction3 <- function(variable, data) {
var.name <- deparse(substitute(variable))
result <- sum(data)/sum(data[[var.name]])
return(result)
}
myfunction3(variable = x, data = df)
#[1] 6

How to write R function that can take either a vector or formula as first argument?

I'm writing a function that I want to be able to take both a vector and a formula as a first argument. If it is a vector, I do some single variable calculations, if it is a formula, I analyze the first variable by the second variable (the second variable would always be a factor).
Here is my current code:
fun = function(formula,data) {
if (class(with(data,formula))=="formula") {
mod = model.frame(formula,data)
n.group=names(mod)[2]
group <- eval(parse(text=paste("mod$",n.group,sep=""))) #x
response <- model.response(mod) # y
return(table(response,group))
}
else {
return(table(with(data,formula)))
}
}
data(iris)
fun(Sepal.Length~Species,iris) # works correctly
fun(Sepal.Length,iris) # returns an error
The return value is just for illustration.
Cheers!
Try this:
fun.formula <- function(formula, data) {
mod = model.frame(formula, data)
n.group <- names(mod)[2]
group <- eval(parse(text=paste("mod$",n.group,sep=""))) #x
response <- model.response(mod) # y
table(response, group)
}
fun <- function(formula, data) {
ret <- try( table(eval(substitute(formula), data), silent = TRUE)
if (inherits(try, "try-error)) fun.formula(formula, data) else ret
}
# tests
fun(Sepal.Length ~ Species, iris)
fun(Sepal.Length, iris)
That said, this is a rather unusual interface and, instead, it might be better to specify the case where formula is a variable by passing its name as a character string in which case a more usual S3 implementation is possible:
fun2 <- function(formula, data, ...) UseMethod("fun2")
fun2.formula <- fun.formula
fun2.character <- function(formula, data) table(data[[formula]])
# tests
fun2(Sepal.Length ~ Species, iris)
fun2("Sepal.Length", iris) # with this approach use a character string
REVISED Now we use try and added an S3 approach.
Ideally, I would have solved this using an S3 approach, but I couldn't figure out how to do that. The following got the job done:
fun <- function(x,data) {
mod = try(model.frame(x,data),silent=T)
if (inherits(mod, "try-error")) {
x=data[,deparse(substitute(x))]
return(table(x))
}
else {
mod = model.frame(x,data)
n.group=names(mod)[2]
group <- eval(parse(text=paste("mod$",n.group,sep=""))) #x
response <- model.response(mod) # y
return(table(response,group))
}
}
fun(Sepal.Length~Species,iris) # works correctly
fun(Sepal.Length,iris) # works!

Resources