I have a series of similar functions that all need to extract some values from a data frame. Something like this:
foo_1 <- function(data, ...) {
x <- data$x
y <- data$y
# some preparatory code common to all foo_X functions
# .. do some boring stuff with x and y
# pack and process the result into 'ret'
return(ret)
}
These functions are then provided as arguments to some other function (let us call it "the master function". I cannot modify the master function).
However, I wish I could avoid re-writing the same preparatory code in each of these functions. For example, I don't want to use data$x instead of assigning it to x and using x because it makes the boring stuff hard to read. Presently, I need to write x <- data$x (etc.) in all of the foo_1, foo_2... functions. Which is annoying and clutters the code. Also, packing and processing is common for all the foo_N functions. Other preparatory code includes scaling of variables or regularization of IDs.
What would be an elegant and terse way of doing this?
One possibility is to attach() the data frame (or use with(), as Hong suggested in the answer below), but I don't know what other variables would then be in my name space: attaching data can mask other variables I use in fun_1. Also, preferably the foo_N functions should be called with explicit parameters, so it is easier to see what they need and what they are doing.
Next possibility I was thinking of was a construction like this:
foo_generator <- function(number) {
tocall <- switch(1=foo_1, 2=foo_2, 3=foo_3) # etc.
function(data, ...) {
x <- data$x
y <- data$y
tocall(x, y, ...)
# process and pack into ret
return(ret)
}
foo_1 <- function(x, y, ...) {
# do some boring stuff
}
Then I can use foo_generator(1) instead of foo_1 as the argument for the master function.
Is there a better or more elegant way? I feel like I am overlooking something obvious here.
You might be overthinking it. You say that the code dealing with preparation and packing are common to all foo_n functions. I assume, then, that # .. do some boring stuff with x and y is where each function differs. If that's the case then just create a single prep_and_pack function which takes a function name as a parameter, and then pass in foo_1, foo_2, etc. For example:
prep_and_pack <- function(data, func){
x <- data$x
y <- data$y
# preparatory code here
xy_output <- func(x, y) # do stuff with x and y
# code to pack and process into "ret"
return(ret)
}
Now you can create your foo_n functions that do different things with x and y:
foo_1 <- function(x, y) {
# .. do some boring stuff with x and y
}
foo_2 <- function(x, y) {
# .. do some boring stuff with x and y
}
foo_3 <- function(x, y) {
# .. do some boring stuff with x and y
}
Finally, you can pass multiple calls to prep_and_pack into your master function, where foo_1 etc. are passed in via the func argument:
master_func(prep_and_pack(data = df, func = foo_1),
prep_and_pack(data = df, func = foo_2),
prep_and_pack(data = df, func = foo_3)
)
You could also use switch in prep_and_pack and/or forgo the foo_n functions completely in favor of if-else conditionals to deal with the various cases, but I think the above keeps things nice a clean.
The requirements still seem a bit vague to me,
but if your code is so similar that you can simply wrap it around a helper function like tocall in your example,
and your input is in a list-like structure
(like a data frame which is just a list of columns),
then just write all your foo_* functions to take the "spliced" parameters like in your proposed solution,
and then use do.call:
foo_1 <- function(x, y) {
x + y
}
foo_2 <- function(x, y) {
x - y
}
data <- list(x = 1:2, y = 3:4)
do.call(foo_1, data)
# [1] 4 6
do.call(foo_2, data)
# [1] -2 -2
I'm not sure the following is a good idea. It reminds me a bit of programming with macros. I don't think I would do this. You'd need to carefully document because it is unexpected, confusing and not self-explanatory.
If you want to reuse the same code in different functions, it might be an option to create it as an unevaluated call and evaluate that call in the different functions:
prepcode <- quote({
x <- data$x
y <- data$y
}
)
foo_1 <- function(data, ...) {
eval(prepcode)
# some preparatory code common to all foo_X functions
# .. do some boring stuff with x and y
# pack and process the result into 'ret'
return(list(x, y))
}
L <- list(x = 1, y = "a")
foo_1(L)
#[[1]]
#[1] 1
#
#[[2]]
#[1] "a"
It might be better, to then have prepcode as an argument to foo_1 to make sure there won't be any scoping issues.
Use with inside the function:
foo_1 <- function(data, ...) {
with(data, {
# .. in here, x and y refer to data$x and data$y
}
}
I'm not sure I understand fully, but can't you simply use a function for all common stuff, and then unpack that into the foo_N functions using list2env? For example:
prepFun <- function(data, ...) {
x <- data$x
y <- data$y
tocall(x, y, ...)
# process and pack into a named list called ret
# so you know specifically what the elements of ret are called
return(ret)
}
foo_1 <- function(data, ...) {
# Get prepFun to do the prepping, then use list2env to get the result in foo_1
# You know which are the named elements, so there should be no confusion about
# what does or does not get masked.
prepResult <- prepFun(data, ...)
list2env(prepResult)
# .. do some boring stuff with x and y
# pack and process the result into 'ret'
return(ret)
}
Hope this is what you're looking for!
I think defining a function factory for this task is a bit overkill and confusing. You can define a general function and use purrr::partial() on it when passing it to your master function.
Something like :
foo <- function(data, ..., number, foo_funs) {
tocall <- foo_funs[[number]])
with(data[c("x", "y")], tocall(x, y, ...))
# process and pack into ret
return(ret)
}
foo_1 <- function(x, y, ...) {
# do some boring stuff
}
foo_funs <- list(foo_1, foo_2, ...)
Then call master_fun(fun = purrr::partial(foo, number =1) , ...)
another possibility is to use list2env which saves the components of a list in to a specified environment:
foo_1 <- function(data){
list2env(data, envir = environment())
x + y
}
foo_1(data.frame(x = 1:2, y = 3:4))
See also this question.
Related
Is there a way to use a function-call to set up a collection of variables with new names?
What I'd like is something like the following:
helper <- function (x) {
y <<- x + 1
NULL
}
main <- function (x) {
helper(x)
return(y)
}
However, there are two problems with this:
the code means that y is defined in the global environment, which I don't want;
I'm also aware that the <<- operator is not kosher as far as CRAN is concerned.
Essentially I'd like to make my function main cleaner by passing a lot of the work it does to helper. Is there any legitimate way to do this for a package that I eventually want to be on CRAN?
I don't think your approach is in any way really sensible (why not use a List), but if you really want to do that, you can use assign to assign variables in arbitrary environments, e.g. the parent frame:
helper <- function(x) {
assign('y', x + 1, envir=parent.frame())
NULL
}
main <- function(x) {
helper(x)
return(y)
}
main(1)
# [1] 2
You can use the strategy to have helper returning a list with the calculated variables and then use them:
helper <- function (x) {
y <- x + 1
list(y = y)
}
main <- function (x) {
vars <- helper(x)
return(vars$y)
}
If you are going to use y often and don't want to always type var$s, you could assign it locally:
main <- function (x) {
vars <- helper(x)
y <- vars$y
return(y)
}
In contrast to assigning variables in arbitrary environments, this makes it way easier to reason what your code does.
I am facing some problem with the apply function passing on arguments to a function when not needed. I understand that apply don't know what to do with the optional arguments and just pass them on the function.
But anyhow, here is what I would like to do:
First I want to specify a list of functions that I would like to use.
functions <- list(length, sum)
Then I would like to create a function which apply these specified functions on a data set.
myFunc <- function(data, functions) {
for (i in 1:length(functions)) print(apply(X=data, MARGIN=2, FUN=functions[[i]]))
}
This works fine.
data <- cbind(rnorm(100), rnorm(100))
myFunc(data, functions)
[1] 100 100
[1] -0.5758939 -5.1311173
But I would also like to use additional arguments for some functions, e.g.
power <- function(x, p) x^p
Which don't work as I want to. If I modify myFunc to:
myFunc <- function(data, functions, ...) {
for (i in 1:length(functions)) print(apply(X=data, MARGIN=2, FUN=functions[[i]], ...))
}
functions as
functions <- list(length, sum, power)
and then try my function I get
myFunc(data, functions, p=2)
Error in FUN(newX[, i], ...) :
2 arguments passed to 'length' which requires 1
How may I solve this issue?
Sorry for the wall of text. Thank you!
You can use Curry from functional to fix the parameter you want, put the function in the list of function you want to apply and finally iterate over this list of functions:
library(functional)
power <- function(x, p) x^p
funcs = list(length, sum, Curry(power, p=2), Curry(power, p=3))
lapply(funcs, function(f) apply(data, 2 , f))
With your code you can use:
functions <- list(length, sum, Curry(power, p=2))
myFunc(data, functions)
I'd advocate using Colonel's Curry approach, but if you want to stick to base R you can always:
funcs <- list(length, sum, function(x) power(x, 2))
which is roughly what Curry ends up doing
One option is to pass the parameters in a list with the arguments needed for each function. You can add those parameters to the others needed for apply using c and then use do.call to call the function. Something like this. I also wrap all the output in a list here rather than using print; your usage may vary.
power <- function(x, p) x^p
myFunc <- function(data, functions, parameters) {
lapply(seq_along(functions), function(i) {
p0 <- list(X=data, MARGIN=2, FUN=functions[[i]])
do.call(apply, c(p0, parameters[[i]]))
})
}
d <- matrix(1:6, nrow=2)
functions <- list(length, sum, power)
parameters <- list(NULL, NULL, p=3)
myFunc(d, functions, parameters)
You can use lazyeval package:
library(lazyeval)
my_evaluate <- function(data, expressions, ...) {
lapply(expressions, function(e) {
apply(data, MARGIN=2, FUN=function(x) {
lazy_eval(e, c(list(x=x), list(...)))
})
})
}
And use it like this:
my_expressions <- lazy_dots(sum = sum(x), sumpow = sum(x^p), length_k = length(x)*k )
data <- cbind(rnorm(100), rnorm(100))
my_evaluate(data, my_expressions, p = 2, k = 2)
I would like to write a wrapper function for two functions that take optional arguments.
Here is an example of a function fun to wrap funA and funB
funA <- function(x = 1, y = 1) return(x+y)
funB <- function(z = c(1, 1) return(sum(z))
fun <- function(x, y, z)
I would like fun to return x+y if x and y are provided, and sum(z) if a vector z is provided.
I have tried to see how the lm function takes such optional arguments, but it is not clear exactly how, e.g., match.call is being used here.
After finding related questions (e.g. How to use R's ellipsis feature when writing your own function? and using substitute to get argument name with )), I have come up with a workable solution.
My solution has just been to use
fun <- function(...){
inputs <- list(...)
if (all(c("x", "y") %in% inputs){
ans <- funA(x, y)
} else if ("z" %in% inputs){
ans <- funB(z)
}
Is there a better way?
Note: Perhaps this question can be closed as a duplicate, but hopefully it can serve a purpose in guiding other users to a good solution: it would have been helpful to have expanded my search to variously include ellipsis, substitute, in addition to match.call.
Use missing. This returns funA(x, y) if both x and y are provided and returns funB if they are not but z is provided and if none of them are provided it returns NULL:
fun <- function(x, y, z) {
if (!missing(x) && !missing(y)) {
funA(x, y)
}
else if (!missing(z)) {
funB(z)
}
This seems to answer your question as stated but note that the default arguments in funA and funB are never used so perhaps you really wanted something different?
Note the fun that is provided in the question only works if the arguments are named whereas the fun here works even if they are provided positionally.
I would something like for example this using match.call. This is similar to your solution but more robust.
fun <- function(...){
arg <- as.list(match.call())[-1]
f <- ifelse(length(arg)>1,"funA","funB")
do.call(f,arg)
}
fun(x=1,y=2) ## or fun(1,2) no need to give named arguments
[1] 3
> fun(z=1:10) ## fun(1:10)
[1] 55
The function testfun1, defined below, does what I want it to do. (For the reasoning of all this, see the background info below the code example.) The question I wanted to ask you is why what I tried in testfun2 doesn't work. To me, both appear to be doing the exact same thing. As shown by the print in testfun2, the evaluation of the helper function inside testfun2 takes place in the correct environment, but the variables from the main function environment get magically passed to the helper function in testfun1, but not in testfun2. Does anyone of you know why?
helpfun <- function(){
x <- x^2 + y^2
}
testfun1 <- function(x,y){
xy <- x*y
environment(helpfun) <- sys.frame(sys.nframe())
x <- eval(as.call(c(as.symbol("helpfun"))))
return(list(x=x,xy=xy))
}
testfun1(x = 2,y = 1:3)
## works as intended
eval.here <- function(fun){
environment(fun) <- parent.frame()
print(environment(fun))
eval(as.call(c(as.symbol(fun))))
}
testfun2 <- function(x,y){
print(sys.frame(sys.nframe()))
xy <- x*y
x <- eval.here("helpfun")
return(list(x=x,xy=xy))
}
testfun2(x = 2,y = 1:3)
## helpfun can't find variable 'x' despite having the same environment as in testfun1...
Background info: I have a large R code in which I want to call helperfunctions inside my main function. They alter variables of the main function environment. The purpose of all this is mainly to unclutter my code. (Main function code is currently over 2000 lines, with many calls to various helperfunctions which themselves are 40-150 lines long...)
Note that the number of arguments to my helper functions is very high, so that the traditional explicit passing of function arguments ( "helpfun(arg1 = arg1, arg2 = arg2, ... , arg50 = arg50)") would be cumbersome and doesnt yield the uncluttering of the code that I am aiming for. Therefore, I need to pass the variables from the parent frame to the helper functions anonymously.
Use this instead:
eval.here <- function(fun){
fun <- get(fun)
environment(fun) <- parent.frame()
print(environment(fun))
fun()
}
Result:
> testfun2(x = 2,y = 1:3)
<environment: 0x0000000013da47a8>
<environment: 0x0000000013da47a8>
$x
[1] 5 8 13
$xy
[1] 2 4 6
I am working on a project to profile function outputs so need to pass a function in as an argument in R. To clarify, I have a varying number of models, and am not looking for assistance on setting up the models, just passing in the model function names into the scoring function.
This works for a direct call, but I want to make it more generic for building out the module. Here is a brief example:
#create a test function:
model1 = function(y,X){
fit = lm(y~X)
output = data.frame(resid = fit$residuals)
}
#score function:
score = function(y,X,model){
y= as.matrix(y)
X = as.matrix(X)
fitModel = model(y,X)
yhat = y - fitModel$residual
output = data.frame(yhat=yhat)
}
I can call this code with valid y and X mats with
df <- data.frame(x=rnorm(5),y=runif(5))
scoreModel1 = score(df$y,df$x,model1)
But what I am looking for is a method of listing all of the models, and looping through, and/or calling the score function in a generic way. For instance:
models = c("model1")
scoreModel1 = score(df$y,df$x,models[1])
The error that I get with the above code is
Error in score(y, X, model) :
could not find function "model"
I have played around with as.function(), and listing and unlisting the args, but nothing works. For instance all the following args have rendered the same error as above
models = c(model1)
models = list(model1)
models = list("model1")
Thank you in advance for your help.
For anyone arriving here from google wondering how to pass a function as an argument, here's a great example:
randomise <- function(f) f(runif(1e3))
randomise(mean)
#> [1] 0.5029048
randomise(sum)
#> [1] 504.245
It's from Hadley's book found here
your list objects can simply be the functions directly. Maybe you can get some use out of this structure, or else take Roland's advice and pass formulas. Richiemorrisroe's answer is probably cleaner.
fun1 <- function(x,y){
x+y
}
fun2 <- function(x,y){
x^y
}
fun3 <- function(x,y){
x*y
}
models <- list(fun1 = fun1, fun2 = fun2, fun3 = fun3)
models[["fun1"]](1,2)
[1] 3
models[[1]](1,2)
[1] 3
lapply(models, function(FUN, x, y){ FUN(x = 1, y = 2)})
$fun1
[1] 3
$fun2
[1] 1
$fun3
[1] 2
match.fun is your friend. It is what apply tapply et al use for the same purpose. Note that if you need to pass arguments to the model fitting functions then you will either need to bundle all of these up into a function like so function(x) sum(x==0, na.rm=TRUE) or else supply them as a list and use do.call like so do.call(myfunc, funcargs).
Hope this helps.
another response:
models = list(model1)
scoreModel1 = score(df$y,df$x,models[[1]])
Example to pass function in as variable:
f_add<- function(x,y){ x + y }
f_subtract<- function(x,y){ x - y }
f_multi<- function(x,y){ x * y }
operation<- function(FUN, x, y){ FUN(x , y)}
operation(f_add, 9,2)
#> [1] 11
operation(f_subtract, 17,5)
#> [1] 12
operation(f_multi,6,8)
#> [1] 48
good luck