Following works:
plot(Sepal.Length ~ Petal.Width, data = iris)
abline(lm(Sepal.Length ~ Petal.Width, data = iris))
But following code does not work:
str = "Sepal.Length ~ Petal.Width, data = iris"
plot(str)
abline(lm(str))
I tried deparse(substitute), as.forumla and eval but they do not work.
Using str from the question try this:
# fun and args should each be a character string
run <- function(fun, args) eval(parse(text = sprintf("%s(%s)", fun, args)))
run("plot", str)
abline(run("lm", str))
Or try this:
`%(%` <- function(fun, args) run(deparse(substitute(fun)), args)
plot %(% str
abline(lm %(% str)
Note that this approach can handle the situation where there are commas in an argument (as opposed to argument separators) and does not make use of any external packages.
Try to parse arguments and create them :
fun_str<- function(fun, str_arg){
## split args separted by comma
m <- as.list(strsplit(str_arg,',')[[1]])
args <- lapply(m,function(x){
## remove any extra space
arg = str_trim(strsplit(x,'=')[[1]])
if (arg[1]=="data") get(arg[2],parent.frame())
else if (grepl('~',x)) as.formula(x)
})
do.call(fun,args)
}
Then call it :
fun_str("plot",str)
fun_str("lm",str)
Here's another alternative. You could use a call object to represent the data argument and then evaluate it in the argument list.
f <- formula("Sepal.Length ~ Petal.Width")
cl <- call("=", "data", iris)
plot(f, eval(cl))
abline(lm(f, eval(cl)))
It looks like this alternate solution will also work with the original str vector.
str <- "Sepal.Length ~ Petal.Width, data = iris"
s <- strsplit(str, ", data = ")[[1]]
with(setNames(as.list(s), c("formula", "data")), {
getd <- get(data, parent.frame())
plot(f <- formula(formula), data = getd)
abline(lm(f, data = getd))
})
Related
I am trying to write a function in R that:
1) Receives a data frame and column name as parameters.
2) Performs an operation on the column in the data frame.
func <- function(col, df)
{
col = deparse(substitute(col))
print(paste("Levels: ", levels(df[[col]])))
}
func(Col1, DF)
func(Col2, DF)
mapply(func, colnames(DF)[1:2], DF)
Output
> func(Col1, DF)
[1] "Levels: GREEN" "Levels: YELLOW"
> func(Col2, DF)
[1] "Levels: 0.1" "Levels: 1"
> mapply(func, colnames(DF)[1:2], DF)
Error in `[[.default`(df, col) : subscript out of bounds
Two things :
in your function func, you apply deparse(substitute(col)) to an object col you expected is not a string. So it works with func(Col1, DF). But in your mapply() call, your argument colnames(...) is a string, so it create an error. Same error obtained with func('Col1', DF).
in a mapply() call, all arguments need to be a vector or a list. So you need to use list(df, df), or if you don't want to replicate, remove the argument df of your function func.
This is one alternative that should work:
func <- function(col, df)
{
print(paste("Levels: ", levels(df[,col])))
}
mapply(FUN = func, colnames(DF)[1:2], list(DF, DF))
Please have a look at the last comment of #demarsylvain - maybe a copy-paste error on your side, you should have done:
func <- function(col,df) {
print(paste("Levels: ", levels(df[,col])))
}
mapply(FUN = func, c('Species', 'Species'), list(iris, iris))
you did:
func <- function(col) {
print(paste("Levels: ", levels(df[,col])))
}
mapply(FUN = func, c('Species', 'Species'), list(iris, iris))
Please upvote and accept the solution of #demarsylvain, it works
EDIT to adress your comment:
To have a generic version for an arbitrary list of column names you can use this code, sorry for the loop :)
func <- function(col,df) {
print(paste("Levels: ", levels(df[,col])))
}
cnames = colnames(iris)
i <- 1
l = list()
while(i <= length(cnames)) {
l[[i]] <- iris
i <- i + 1
}
mapply(FUN = func, cnames, l)
In R, I'm trying to pass arguments from an outer function to an inner function, where they will be evaluated. This fails, presumably because the arguments are not evaluated when they are passed.
Here are two examples; one uses match.call() to grab the arguments of the outer function and pass them along, the other manually binds the arguments of the outer function in a list, then calls the inner function. I suspect that I'm failing to grasp some of the intricacies of match.call() and lazy evaluation. Is there an elegant way to fix this problem?
# Use match.call to pass arguments to inner_function
outer_fun_a <- function(formula, data){
args <- as.list(match.call())[-1]
mods <- names(get_all_vars(formula, data = data))[-1]
inner_function(mods, args)
}
# Make a list of the arguments of outer_fun_b
outer_fun_b <- function(formula, data){
args <- list(formula = formula, data = data)
mods <- names(get_all_vars(formula, data = data))[-1]
inner_function(mods, args)
}
inner_function <- function(modvars, args){
args$formula <- update(args$formula, paste0("~ ", paste(modvars, collapse = " + ")))
do.call(lm, args)
}
df <- iris
# This fails, because the arguments in args have not been evuated
outer_fun_a(as.formula(paste0("Sepal.Length ~ ", paste(names(df)[2:5], collapse = " + "))),
df)
# This succeeds
outer_fun_b(as.formula(paste0("Sepal.Length ~ ", paste(names(df)[2:5], collapse = " + "))),
df)
I am trying to write a function around "lm" using tidyeval (non-standard evaluation).Using base R NSE, it works:
lm_poly_raw <- function(df, y, x, degree = 1, ...){
lm_formula <-
substitute(expr = y ~ poly(x, degree, raw = TRUE),
env = list(y = substitute(y),
x = substitute(x),
degree = degree))
eval(lm(lm_formula, data = df, ...))
}
lm_poly_raw(mtcars, hp, mpg, degree = 2)
However, I have not figured out how to write this function using tidyeval and rlang. I assume that substitute should be replaced be enquo, and eval by !!. There are some hints in Hadley's Adv-R, but I could not figure it out.
Here is the kind of formula constructor that might make its way in rlang in the future:
f <- function(x, y, flatten = TRUE) {
x <- enquo(x)
y <- enquo(y)
# Environments should be the same
# They could be different if forwarded through dots
env <- get_env(x)
stopifnot(identical(env, get_env(y)))
# Flatten the quosures. This warns the user if nested quosures are
# found. Those are not supported by functions like lm()
if (flatten) {
x <- quo_expr(x, warn = TRUE)
y <- quo_expr(y, warn = TRUE)
}
new_formula(x, y, env = env)
}
# This can be used for unquoting symbols
var <- "cyl"
lm(f(disp, am + (!! sym(var))), data = mtcars)
The tricky parts are:
The LHS and RHS could come from different environments if forwarded through different layers of .... We need to check for this.
We need to check that the user doesn't unquote quosures. lm() and co do not support those. quo_expr() flattens all the quosures and optionally warns if some were found.
I would like to have a function like my_lm, exemplified below:
library(rlang)
base_formula <- new_formula(lhs = quote(potato),
rhs = quote(Sepal.Width + Petal.Length))
my_lm <- function(response) {
lm(formula = update(old = base_formula, new = quote(response) ~ . ),
data = iris)
}
my_lm(response = Sepal.Length)
But I am met with the following error:
Error in model.frame.default(formula = update(old = base_formula, new = enquo(response) ~ :
object is not a matrix
I suspect I am misusing rlang, but I can't seem to figure out what combination of quoting, unquoting and formulating would solve this problem.
EDIT: desired output is as if I ran:
lm(formula = Sepal.Length ~ Sepal.Width + Petal.Length,
data = iris)
EDIT2: I should also clarify, I'm really interested in a solution that uses rlang to solve this problem through update more so than a solution that uses paste, gsub, and formula.
Here's something that works
base_formula <- new_formula(lhs = quote(potato),
rhs = quote(Sepal.Width + Petal.Length))
my_lm <- function(response) {
newf <- new_formula(get_expr(enquo(response)), quote(.))
lm(formula = update(old = base_formula, new = newf),
data = iris)
}
my_lm(response = Sepal.Length)
It seems to be a bit of a mess because quosures are also basically formulas and you're trying to make a regular formula with them. And then new_formula doesn't seem to allow for !! expansion.
If you are really just interested in changing the left hand side, something like this might be more direct
my_lm <- function(response) {
newf <- base_formula
f_lhs(newf) <- get_expr(enquo(response))
lm(formula = get_expr(newf),
data = iris)
}
This is closely related to the question: How do I pass ``...`` to a new environment in R?
And the thread here.
My ultimate goal is to be able to have a function that:
operates on a function and returns a function
The return function creates a new environment with .GlobalEnv as its parent
it evaluates the argument function inside the new environment.
And it solves the save-size problem described below.
The save size problem is the problem that lm (and other, for example ggplot) objects save their calling environments, which can often contain irrelevant information. The goal is to have a convenient wrapper that fixes this problem. To demonstrate:
saveSize <- function (object) {
tf <- tempfile(fileext = ".RData")
on.exit(unlink(tf))
save(object, file = tf)
file.size(tf)
}
tmp_fun <- function(){
iris_big <- lapply(1:10000, function(x) iris)
lm(Sepal.Length ~ Sepal.Width, data = iris)
}
out <- tmp_fun()
object.size(out)
# 48008
saveSize(out)
# 1002448 - Far too large as it contains iris_big.
Bill Dunlap proposes this solution which works:
tmp_fun_Bill <- function(){
iris_big <- lapply(1:10000, function(x) iris)
env <- new.env(parent = globalenv())
with(env, lm(Sepal.Length ~ Sepal.Width, data = iris))
}
out <- tmp_fun_Bill()
object.size(out)
# 48008
saveSize(out)
# 4478 - this works!
I would like to generalize Bill's approach into one of these functions that return functions (like purrr::safely).
My best attempt, with help from #MrFlick:
in_new_env <- function(.f){
function(...) {
params <- list(...)
env <- new.env(parent = globalenv())
# Change the environment of any formula objects
params <- lapply(params, function(x) {if (inherits("x","formula")) {environment(x)<-env}; x})
assign(".params.", params, envir = env)
env$.f <- .f
evalq(do.call(".f", .params.), envir=env)
}
}
tmp_fun_me <- function(){
iris_big <- lapply(1:10000, function(x) iris)
in_new_env(lm)(Sepal.Length ~ Sepal.Width, data = iris)
}
out <- tmp_fun_me()
object.size(out)
# 48008
saveSize(out)
# 1002448 - too big again
Can someone point out what's going wrong here?
The problem is really the that the formula grabs the current environment. Here's a function that will set the environment for the formula to an empty environment and
dropenv <- function(x) {
env <- new.env(parent = globalenv())
if (inherits(x,"formula")) {
environment(x)<-env
}
x
}
tmp_fun_drop <- function(){
iris_big <- lapply(1:10000, function(x) iris)
lm(dropenv(Sepal.Length ~ Sepal.Width), data = iris)
}
But this requires evaulating and parsing the formula argument to the lm() function. Here's a possible workaround for your desired method calling
in_new_env <- function(.f){
function(formula, ...) {
formula <- dropenv(formula)
.f(formula, ...)
}
}
tmp_fun_drop <- function(){
iris_big <- lapply(1:10000, function(x) iris)
in_new_env(lm)(Sepal.Length ~ Sepal.Width, data = iris)
}
Now the function returned by in_new_env assumes that the first parameter will be a formula and will clear out the environment for that formula.