How to stop reference a data frame in a function

How to stop reference a data frame in a function - r

I want to build a function in such a way that once i supplied data='name of data frame' there is no need to write variable=data$variable as just writing variable name from the supplied data frame will serve the purpose
myfunction<-function(variable,data)
{
result=sum(data)/sum(variable)
return(result)
}
for example i have a data frame df
df<-data.frame(x=1:5,y=2:6,z=3:7,u=4:8)
I want to provide following input
myfunction(variable=x,data=df)
instead of below input to serve the purpose
myfunction(variable=df$x,data=df)

We can use non-standard evaluation:
myfunction <- function(variable, data) {
var <- eval(substitute(variable), data)
result = sum(data)/sum(var)
return(result)
}
# Test
myfunction(variable = x, data = df)
#[1] 6

The with or attach functions can help you here, see the ?with and ?attach documentation. Alternatively, you can supply the variable name as a character and use this in the function body. I.e. you can do something like this:
myfunction2 <- function(variable, data) {
result <- sum(data)/sum(data[[variable]])
return(result)
}
df <- data.frame(x=1:5,y=2:6,z=3:7,u=4:8)
myfunction2("x", df)
#[1] 6
Yet another resort is to use non-standard evaluation. A small example of this is something like:
myfunction3 <- function(variable, data) {
var.name <- deparse(substitute(variable))
result <- sum(data)/sum(data[[var.name]])
return(result)
}
myfunction3(variable = x, data = df)
#[1] 6

Related

control scoping of arguments supplied to lm() from whithin a function called by lapply

I have a function that takes a dataset, extracts different variables, and then makes linear models from those variables (it expects the response in the last column). I want the data argument of the calls for these models to use objects in the global environment so that I can manipulate them with other functions outside this function. The following gives the expected behavior when provided with a single dataset.
make_mods <- function(dataset) {
make_mod <- function(x){
response <- names(dataset)[length(dataset)]
form <- paste0(response, " ~ ", x)
form <- as.formula(form)
bquote( lm(.(form), data = .(d_sub)) ) # Unevaluated to show output
}
d_sub <- substitute(dataset)
vars <- names(dataset)[-length(dataset)]
mods <- lapply(vars, make_mod)
return(mods)
}
# Make some different datasets
ex1 <- ex2 <- ex3 <- mtcars[c(3,4,6,1)]
new_data <- function(x) {
x + rnorm(length(x), mean = 0, sd = sd(x))
}
ex2[-length(ex2)] <- lapply(ex2[-length(ex2)], new_data)
ex3[-length(ex3)] <- lapply(ex3[-length(ex3)], new_data)
make_mods(ex1)
I also want to be able to use this function within lapply
# List of datasets for testing function with lapply
ex_l <- mget(c("ex1", "ex2", "ex3"))
lapply(ex_l, make_mods)
But here the model calls end up looking like this: lm(mpg ~ disp, data = X[[i]]) and, of course, this model call doesn't evaluate in the default environment (the actual function evaluates the model call in the function). The desired output is a list of lists of models that look like: lm(mpg ~ disp, data = ex_l[["ex1"]]), i.e., they have valid calls with data arguments that reference data frames in the global environment.
I've experimented with passing names to lapply and different wrapper functions for calling make_mods from lapply but it seems like my function, in using substitute only gives the expected behavior when called from the global environment. I'm new to working with scoping and environments. How can I get my function to give the desired lm call both when passed a data frame from the global environment, and when passed data frames from within lapply.

The only thing that I could think of was to add an if statement to my make mods function that tests if the input is a call or not. If it's a call, it expects it to be a call for a dataset in the global environment.
make_mods <- function(dataset) {
make_mod <- function(x){
response <- names(dataset)[length(dataset)]
form <- paste0(response, " ~ ", x)
form <- as.formula(form)
bquote( lm(.(form), data = .(d_sub)) )
}
if(is.call(dataset)) {
d_sub <- dataset
dataset <- eval(dataset)
} else {
d_sub <- substitute(dataset)
}
vars <- names(dataset)[-length(dataset)]
mods <- lapply(vars, make_mod)
return(mods)
}
Then I can use lapply like this:
out <- lapply(names(ex_l), function(x){
g <- bquote(ex_l[[.(x)]])
make_mods(g)
})
names(out) <- names(ex_l)
which gives me this:
$ex1
$ex1[[1]]
lm(mpg ~ disp, data = ex_l[["ex1"]])
$ex1[[2]]
lm(mpg ~ hp, data = ex_l[["ex1"]])
$ex1[[3]]
lm(mpg ~ wt, data = ex_l[["ex1"]])
<<output truncated>>
Maybe not an elegant solution, but it's working.

Store results of function in complex object (?) in R

everbody knows this functions:
fitted_lm = lm(mpg ~ wt, mtcars)
> class(fitted_lm)
[1] "lm"
fitted_lm$
In my global environment I see that fitted_lm is stored as a list, but when i call the class function on the object, I get "lm" as result. These object allows it to easily access different values with the "$"-sign. How can I store my own results that kind of way?
For example something like this:
complex_output <- function(x) {
row.means <- rowMeans(x)
col.means <- colMeans(x)
result <- list(row.means, col.means)
return(result)
}
complex_result <- complex_output(x = mtcars)
complex_result[[1]]
complex_result$ # does not work
I can of course access all results via double brackets, but it would be much more convenient to access them with the "$"-sign. How can I do that?

Try assigning names to the elements in the list generated by your function:
complex_output <- function(x) {
row.means <- rowMeans(x)
col.means <- colMeans(x)
result <- list(r=row.means, c=col.means)
return(result)
}
complex_result$r

R: substitue and execution environments

I am currently writing a function that will take an equation as an argument. The function will expect variables to be apart of the column names of data.
mydata <- data.frame(x=c(1,2,3,4),y=c(5,6,7,8), z=c(9,10,11,12))
my_function <- function(data, equ) {
EQU.sub <- deparse(substitute(equ))
#Check if colnames are used
for(i in 1:length(colnames(data)) {
if(str_detect(string = EQU.sub, pattern = colnames(data)[i])) {
#if used, create variable with its name.
assign(x = colnames(data)[i],
value = eval(parse(text = paste("data$",
colnames(data),
sep = ""))))
} else {
warning(paste(colnames[i], "was not used in EQU"))
}
}
df$new.value <- eval(equ)
output <- function(new.equ = equ)
return(df)
}
my_function(data = mydata, equ = x+(y^2))
I know what you may be thinking, this is a big workaround for just doing
mydata$x+(mydata$y^2)
THE ISSUE
The issue is that I want to pass my input of equ into an new function.
new_function <- function(new.equ) {
string <- deparse(substitute(new.equ))
#does some stuff....
return(output) }
however, when changing from execution environment of my_function to new_function, calling deparse(substitute(equ)) returns "equ" instead of "x+(y^2)"
I know that the function substitute returns what was explicitly assigned to the variable. (equ) but I am wondering if there is a way for new_function() to be able to see into the execution environment of my_function() so I can get the desired output of "x+(y^2)"
UPDATE
After thinking about it, I could change what I pass to new.equ to the deparsed version of equ as follows...
output <- function(new.equ = EQU.sub)
new_function <- function(new.equ) {
#given that these variables are available
value <- parse(text = new.equ)
#does some stuff....
return(output) }
but my original question still stands because I'm still new to R environments. Is there a more elegant way to go through execution environments?

Using non-standard evaulation like this can be pretty messy. Rather than trying to capture expressions from promises passed to functions, it's much safer just to pass a formula. For example
mydata <- data.frame(x=c(1,2,3,4),y=c(5,6,7,8), z=c(9,10,11,12))
my_function <- function(data, equ) {
stopifnot(inherits(equ, "formula"))
eval(equ[[2]], data)
}
new_function <- function(newequ) {
my_function(mydata, newequ)
}
my_function(mydata, ~x+(y^2))
new_function(~x+(y^2))
Or give your function an extra parameter where you can pass an expression instead so you don't have to rely on a promise. This makes it much easier to write other functions that can call your function.
my_function <- function(data, equ, .equ=substitute(equ)) {
eval(.equ, data)
}
new_function <- function(newequ) {
equ <- substitute(newequ)
my_function(mydata, .equ=equ)
}
my_function(mydata, x+(y^2))
new_function(x+(y^2))
my_function(mydata, .equ=quote(x+(y^2)))

R saving function call with formula for reuse in bootstrapping

I've got some code that creates an object from a formula and saves the call for future use, as so:
create_obj <- function(formula, data) {
obj <- list()
# Populate obj with whatever we're interested in
# ...
# Save call for future use
obj$call <- match.call()
obj
}
obj <- create_obj(time ~ sex, data)
If I then bootstrap data I can easily build the model on the new dataset:
data <- data[sample(1:nrow(data), replace=T), ]
new_obj <- eval(obj$call)
However, if I have the formula saved in a variable and I pass the object into a new environment this won't work:
do_stuff <- function(object, newdata) {
data <- newdata[sample(1:nrow(newdata), replace=T), ]
new_object <- eval(object$call)
}
main <- function() {
my_form <- time ~ sex
obj2 <- create_obj(my_form, data)
# obj2$call: 'create_obj(formula = my_form, data = data)'
do_stuff(obj2, data)
}
Error: object my_form not found.
How can I have it so that obj$call saves time~sex rather than myform? Otherwise I need to pass the formula itself around rather than just the object, limiting the practicality.
The above example isn't reproducible but you can see the same thing with a standard lm call.
EDIT: I've solved the problem, see the accepted answer.

I've solved it by having the constructor function modify the call by evaluating the constant argument in the local environment:
create_obj <- function(formula, data) {
obj <- list()
# Populate obj with whatever we're interested in
# ...
# Save call for future use
func_call <- match.call()
func_call$formula <- eval(formula)
# obj_call is now: create_obj(formula=time~sex, data=data)
obj$call <- func_call
obj
}

How to write R function that can take either a vector or formula as first argument?

I'm writing a function that I want to be able to take both a vector and a formula as a first argument. If it is a vector, I do some single variable calculations, if it is a formula, I analyze the first variable by the second variable (the second variable would always be a factor).
Here is my current code:
fun = function(formula,data) {
if (class(with(data,formula))=="formula") {
mod = model.frame(formula,data)
n.group=names(mod)[2]
group <- eval(parse(text=paste("mod$",n.group,sep=""))) #x
response <- model.response(mod) # y
return(table(response,group))
}
else {
return(table(with(data,formula)))
}
}
data(iris)
fun(Sepal.Length~Species,iris) # works correctly
fun(Sepal.Length,iris) # returns an error
The return value is just for illustration.
Cheers!

Try this:
fun.formula <- function(formula, data) {
mod = model.frame(formula, data)
n.group <- names(mod)[2]
group <- eval(parse(text=paste("mod$",n.group,sep=""))) #x
response <- model.response(mod) # y
table(response, group)
}
fun <- function(formula, data) {
ret <- try( table(eval(substitute(formula), data), silent = TRUE)
if (inherits(try, "try-error)) fun.formula(formula, data) else ret
}
# tests
fun(Sepal.Length ~ Species, iris)
fun(Sepal.Length, iris)
That said, this is a rather unusual interface and, instead, it might be better to specify the case where formula is a variable by passing its name as a character string in which case a more usual S3 implementation is possible:
fun2 <- function(formula, data, ...) UseMethod("fun2")
fun2.formula <- fun.formula
fun2.character <- function(formula, data) table(data[[formula]])
# tests
fun2(Sepal.Length ~ Species, iris)
fun2("Sepal.Length", iris) # with this approach use a character string
REVISED Now we use try and added an S3 approach.

Ideally, I would have solved this using an S3 approach, but I couldn't figure out how to do that. The following got the job done:
fun <- function(x,data) {
mod = try(model.frame(x,data),silent=T)
if (inherits(mod, "try-error")) {
x=data[,deparse(substitute(x))]
return(table(x))
}
else {
mod = model.frame(x,data)
n.group=names(mod)[2]
group <- eval(parse(text=paste("mod$",n.group,sep=""))) #x
response <- model.response(mod) # y
return(table(response,group))
}
}
fun(Sepal.Length~Species,iris) # works correctly
fun(Sepal.Length,iris) # works!

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to stop reference a data frame in a function - r

We can use non-standard evaluation: myfunction <- function(variable, data) { var <- eval(substitute(variable), data) result = sum(data)/sum(var) return(result) } # Test myfunction(variable = x, data = df) #[1] 6

Related

control scoping of arguments supplied to lm() from whithin a function called by lapply

Store results of function in complex object (?) in R

R: substitue and execution environments

R saving function call with formula for reuse in bootstrapping

How to write R function that can take either a vector or formula as first argument?

Categories

Resources