Passing arguments to iterated function through apply - r

I have a function like this dummy-one:
FUN <- function(x, parameter){
if (parameter == 1){
z <- DO SOMETHING WITH "x"}
if (parameter ==2){
z <- DO OTHER STUFF WITH "x"}
return(z)
}
Now, I would like to use the function on a dataset using apply.
The problem is, that apply(data,1,FUN(parameter=1))
wont work, as FUN doesn't know what "x" is.
Is there a way to tell apply to call FUN with "x" as the current row/col?
`

You want apply(data,1,FUN,parameter=1). Note the ... in the function definition:
> args(apply)
function (X, MARGIN, FUN, ...)
NULL
and the corresponding entry in the documentation:
...: optional arguments to ‘FUN’.

You can make an anonymous function within the call to apply so that FUN will know what "x" is:
apply(data, 1, function(x) FUN(x, parameter = 1))
See ?apply for examples at the bottom that use this method.

Here's a practical example of passing arguments using the ... object and *apply. It's slick, and this seemed like an easy example to explain the use. An important point to remember is when you define an argument as ... all calls to that function must have named arguments. (so R understands what you're trying to put where). For example, I could have called times <- fperform(longfunction, 10, noise = 5000) but leaving off noise = would have given me an error because it's being passed through ... My personal style is to name all of the arguments if a ... is used just to be safe.
You can see that the argument noise is being defined in the call to fperform(FUN = longfunction, ntimes = 10, noise = 5000) but isn't being used for another 2 levels with the call to diff <- rbind(c(x, runtime(FUN, ...))) and ultimately fun <- FUN(...)
# Made this to take up time
longfunction <- function(noise = 2500, ...) {
lapply(seq(noise), function(x) {
z <- noise * runif(x)
})
}
# Takes a function and clocks the runtime
runtime <- function(FUN, display = TRUE, ...) {
before <- Sys.time()
fun <- FUN(...)
after <- Sys.time()
if (isTRUE(display)) {
print(after-before)
}
else {
after-before
}
}
# Vectorizes runtime() to allow for multiple tests
fperform <- function(FUN, ntimes = 10, ...) {
out <- sapply(seq(ntimes), function(x) {
diff <- rbind(c(x, runtime(FUN, ...)))
})
}
times <- fperform(FUN = longfunction, ntimes = 10, noise = 5000)
avgtime <- mean(times[2,])
print(paste("Average Time difference of ", avgtime, " secs", sep=""))

Related

ISO a good way to let a function accept a mix of supplied arguments, arguments from a list, and defaults

I would like to have a function accept arguments in the usual R way, most of which will have defaults. But I would also like it to accept a list of named arguments corresponding to some or some or all of the formals. Finally, I would like arguments supplied to the function directly, and not through the list, to override the list arguments where they conflict.
I could do this with a bunch of nested if-statements. But I have a feeling there is some elegant, concise, R-ish programming-on-the-language solution -- probably multiple such solutions -- and I would like to learn to use them. To show the kind of solution I am looking for:
> arg_lst <- list(x=0, y=1)
> fn <- function(a_list = NULL, x=2, y=3, z=5, ...){
<missing code>
print(c(x, y, z))
}
> fn(a_list = arg_list, y=7)
Desired output:
x y z
0 7 5
I like a lot about #jdobres's approach, but I don't like the use of assign and the potential scoping breaks.
I also don't like the premise, that a function should be written in a special way for this to work. Wouldn't it be better to write a wrapper, much like do.call, to work this way with any function? Here is that approach:
Edit: solution based off of purrr::invoke
Thinking a bit more about this, purrr::invoke almost get's there - but it will result in an error if a list argument is also passed to .... But we can make slight modifications to the code and get a working version more concisely. This version seems more robust.
library(purrr)
h_invoke = function (.f, .x = NULL, ..., .env = NULL) {
.env <- .env %||% parent.frame()
args <- c(list(...), as.list(.x)) # switch order so ... is first
args = args[!duplicated(names(args))] # remove duplicates
do.call(.f, args, envir = .env)
}
h_invoke(fn, arg_list, y = 7)
# [1] 0 7 5
Original version borrowing heavily from jdobres's code:
hierarchical_do_call = function(f, a_list = NULL, ...){
formal_args = formals() # get the function's defined inputs and defaults
formal_args[names(formal_args) %in% c('f', 'a_list', '...')] = NULL # remove these two from formals
supplied_args <- as.list(match.call())[-1] # get the supplied arguments
supplied_args[c('f', 'a_list')] = NULL # ...but remove the argument list and the function
a_list[names(supplied_args)] = supplied_args
do.call(what = f, args = a_list)
}
fn = function(x=2, y=3, z=5) {
print(c(x, y, z))
}
arg_list <- list(x=0, y=1)
hierarchical_do_call(f = fn, a_list = arg_list, y=7)
# x y z
# 0 7 5
I'm not sure how "elegant" this is, but here's my best attempt to satisfy the OP's requirements. The if/else logic is actually pretty straightforward (no nesting needed, per se). The real work is in collecting and sanitizing the three different input types (formal defaults, the list object, and any supplied arguments).
fn <- function(a_list = NULL, x = 2, y = 3, z = 5, ...) {
formal_args <- formals() # get the function's defined inputs and defaults
formal_args[names(formal_args) %in% c('a_list', '...')] <- NULL # remove these two from formals
supplied_args <- as.list(match.call())[-1] # get the supplied arguments
supplied_args['a_list'] <- NULL # ...but remove the argument list
# for each uniquely named item among the 3 inputs (argument list, defaults, and supplied args):
for (i in unique(c(names(a_list), names(formal_args), names(supplied_args)))) {
if (!is.null(supplied_args[[i]])) {
assign(i, supplied_args[[i]])
} else if (!is.null(a_list[[i]])) {
assign(i, a_list[[i]])
}
}
print(c(x, y, z))
}
arg_lst <- list(x = 0, y = 1)
fn(a_list = arg_lst, y=7)
[1] 0 7 5
With a little more digging into R's meta-programming functions, it's actually possible to pack this hierarchical assignment into its own function, which is designed to operate on the function environment that called it. This makes it easier to reuse this functionality, but it definitely breaks scope and should be considered dangerous.
The "hierarchical assignment" function, mostly the same as before:
hierarchical_assign <- function(a_list) {
formal_args <- formals(sys.function(-1)) # get the function's defined inputs and defaults
formal_args[names(formal_args) %in% c('a_list', '...')] <- NULL # remove these two from formals
supplied_args <- as.list(match.call(sys.function(-1), sys.call(-1)))[-1] # get the supplied arguments
supplied_args['a_list'] <- NULL # ...but remove the argument list
# for each uniquely named item among the 3 inputs (argument list, defaults, and supplied args):
for (i in unique(c(names(a_list), names(formal_args), names(supplied_args)))) {
if (!is.null(supplied_args[[i]])) {
assign(i, supplied_args[[i]], envir = parent.frame())
} else if (!is.null(a_list[[i]])) {
assign(i, a_list[[i]], envir = parent.frame())
}
}
}
And the usage. Note that the the calling function must have an argument named a_list, and it must be passed to hierarchical_assign.
fn <- function(a_list = NULL, x = 2, y = 3, z = 5, ...) {
hierarchical_assign(a_list)
print(c(x, y, z))
}
[1] 0 7 5
I think do.call() does exactly what you want. It accepts a function and a list as arguments, the list being arguments for the functions. I think you will need a wrapper function to create this behavior of "overwriting defaults"

Pass optional arguments to function, three dots

I'm confused how ... works.
tt = function(...) {
return(x)
}
Why doesn't tt(x = 2) return 2?
Instead it fails with the error:
Error in tt(x = 2) : object 'x' not found
Even though I'm passing x as argument ?
Because everything you pass in the ... stays in the .... Variables you pass that aren't explicitly captured by a parameter are not expanded into the local environment. The ... should be used for values your current function doesn't need to interact with at all, but some later function does need to use do they can be easily passed along inside the .... It's meant for a scenario like
ss <- function(x) {
x
}
tt <- function(...) {
return(ss(...))
}
tt(x=2)
If your function needs the variable x to be defined, it should be a parameter
tt <- function(x, ...) {
return(x)
}
If you really want to expand the dots into the current environment (and I strongly suggest that you do not), you can do something like
tt <- function(...) {
list2env(list(...), environment())
return(x)
}
if you define three dots as an argument for your function and want it to work, you need to tell your function where the dots actually go. in your example you are neither defining x as an argument, neither ... feature elsewhere in the body of your function. an example that actually works is:
tt <- function(x, ...){
mean(x, ...)
}
x <- c(1, 2, 3, NA)
tt(x)
#[1] NA
tt(x, na.rm = TRUE)
#[1] 2
here ... is referring to any other arguments that the function mean might take. additionally you have a regular argument x. in the first example tt(x) just returns mean(x), whilst in the second example tt(x, na.rm = TRUE), passes the second argument na.rm = TRUE to mean so tt returns mean(x, na.rm = TRUE).
Another way that the programmers of R use a lot is list(...) as in
tt <- function(...) {
args <- list(...) # As in this
if("x" %in% names(args))
return(args$x)
else
return("Something else.")
}
tt(x = 2)
#[1] 2
tt(y = 1, 2)
#[1] "Something else."
I believe that this is one of their favorite, if not the favorite, way of handling the dots arguments.

Not passing all optional arguments in apply

I am facing some problem with the apply function passing on arguments to a function when not needed. I understand that apply don't know what to do with the optional arguments and just pass them on the function.
But anyhow, here is what I would like to do:
First I want to specify a list of functions that I would like to use.
functions <- list(length, sum)
Then I would like to create a function which apply these specified functions on a data set.
myFunc <- function(data, functions) {
for (i in 1:length(functions)) print(apply(X=data, MARGIN=2, FUN=functions[[i]]))
}
This works fine.
data <- cbind(rnorm(100), rnorm(100))
myFunc(data, functions)
[1] 100 100
[1] -0.5758939 -5.1311173
But I would also like to use additional arguments for some functions, e.g.
power <- function(x, p) x^p
Which don't work as I want to. If I modify myFunc to:
myFunc <- function(data, functions, ...) {
for (i in 1:length(functions)) print(apply(X=data, MARGIN=2, FUN=functions[[i]], ...))
}
functions as
functions <- list(length, sum, power)
and then try my function I get
myFunc(data, functions, p=2)
Error in FUN(newX[, i], ...) :
2 arguments passed to 'length' which requires 1
How may I solve this issue?
Sorry for the wall of text. Thank you!
You can use Curry from functional to fix the parameter you want, put the function in the list of function you want to apply and finally iterate over this list of functions:
library(functional)
power <- function(x, p) x^p
funcs = list(length, sum, Curry(power, p=2), Curry(power, p=3))
lapply(funcs, function(f) apply(data, 2 , f))
With your code you can use:
functions <- list(length, sum, Curry(power, p=2))
myFunc(data, functions)
I'd advocate using Colonel's Curry approach, but if you want to stick to base R you can always:
funcs <- list(length, sum, function(x) power(x, 2))
which is roughly what Curry ends up doing
One option is to pass the parameters in a list with the arguments needed for each function. You can add those parameters to the others needed for apply using c and then use do.call to call the function. Something like this. I also wrap all the output in a list here rather than using print; your usage may vary.
power <- function(x, p) x^p
myFunc <- function(data, functions, parameters) {
lapply(seq_along(functions), function(i) {
p0 <- list(X=data, MARGIN=2, FUN=functions[[i]])
do.call(apply, c(p0, parameters[[i]]))
})
}
d <- matrix(1:6, nrow=2)
functions <- list(length, sum, power)
parameters <- list(NULL, NULL, p=3)
myFunc(d, functions, parameters)
You can use lazyeval package:
library(lazyeval)
my_evaluate <- function(data, expressions, ...) {
lapply(expressions, function(e) {
apply(data, MARGIN=2, FUN=function(x) {
lazy_eval(e, c(list(x=x), list(...)))
})
})
}
And use it like this:
my_expressions <- lazy_dots(sum = sum(x), sumpow = sum(x^p), length_k = length(x)*k )
data <- cbind(rnorm(100), rnorm(100))
my_evaluate(data, my_expressions, p = 2, k = 2)

Using "..." and "replicate"

In the documentation of sapply and replicate there is a warning regarding using ...
Now, I can accept it as such, but would like to understand what is behind it. So I've created this little contrived example:
innerfunction<-function(x, extrapar1=0, extrapar2=extrapar1)
{
cat("x:", x, ", xp1:", extrapar1, ", xp2:", extrapar2, "\n")
}
middlefunction<-function(x,...)
{
innerfunction(x,...)
}
outerfunction<-function(x, ...)
{
cat("Run middle function:\n")
replicate(2, middlefunction(x,...))
cat("Run inner function:\n")
replicate(2, innerfunction(x,...))
}
outerfunction(1,2,3)
outerfunction(1,extrapar1=2,3)
outerfunction(1,extrapar1=2,extrapar2=3)
Perhaps I've done something obvious horribly wrong, but I find the result of this rather upsetting. So can anyone explain to me why, in all of the above calls to outerfunction, I get this output:
Run middle function:
x: 1 , xp1: 0 , xp2: 0
x: 1 , xp1: 0 , xp2: 0
Run inner function:
x: 1 , xp1: 0 , xp2: 0
x: 1 , xp1: 0 , xp2: 0
Like I said: the docs seem to warn for this, but I do not see why this is so.
?replicate, in the Examples section, tells us explicitly that what you are trying to do does not and will not work. In the Note section of ?replicate we have:
If ‘expr’ is a function call, be aware of assumptions about where
it is evaluated, and in particular what ‘...’ might refer to. You
can pass additional named arguments to a function call as
additional named arguments to ‘replicate’: see ‘Examples’.
And if we look at Examples, we see:
## use of replicate() with parameters:
foo <- function(x=1, y=2) c(x,y)
# does not work: bar <- function(n, ...) replicate(n, foo(...))
bar <- function(n, x) replicate(n, foo(x=x))
bar(5, x=3)
My reading of the docs is that they do far more than warn you about using ... in replicate() calls; they explicitly document that it does not work. Much of the discussion in that help file relates to the ... argument of the other functions, not necessarily to replicate().
If you look at the code for replicate:
> replicate
function (n, expr, simplify = TRUE)
sapply(integer(n), eval.parent(substitute(function(...) expr)),
simplify = simplify)
<environment: namespace:base>
You see that the function is evaluated in the parent frame, where the ... from your calling function no longer exists.
There actually is a way to do this:
# Simple function:
ff <- function(a,b) print(a+b)
# This will NOT work:
testf <- function(...) {
replicate(expr = ff(...), n = 5)
}
testf(45,56) # argument "b" is missing, with no default
# This will:
testf <- function(...) {
args <- as.list(substitute(list(...)))[-1L]
replicate(expr = do.call(ff, args), n = 5)
}
testf(45,56) # 101
An alternative way to do that:
g <- function(x, y) x + y
f <- function(a = 1, ...) {
arg_list <- list(...)
replicate(n = 3, expr = do.call(g, args = arg_list))
}
f(x = 1, y = 2)

Simplify ave() or aggregate() with several inputs

How can I write this all in one line?
mydata is a "zoo" series, limit is a numeric vector of the same size
tmp <- ave(coredata(mydata), as.Date(index(mydata)),
FUN = function(x) cummax(x)-x)
tmp <- (tmp < limit)
final <- ave(tmp, as.Date(index(mydata)),
FUN = function(x) cumprod(x))
I've tried to use two vectors as argument to ave(...) but it seems to accept just one even if I join them into a matrix.
This is just an example, but any other function could be use.
Here I need to compare the value of cummax(mydata)-mydata with a numeric vector and
once it surpasses it I'll keep zeros till the end of the day. The cummax is calculated from the beginning of each day.
If limit were a single number instead of a vector (with different possible numbers) I could write it:
ave(coredata(mydata), as.Date(index(mydata)),
FUN = function(x) cumprod((cummax(x) - x) < limit))
But I can't introduce there a vector longer than x (it should have the same length than each day) and I don't know how to introduce it as another argument in ave().
Seems like this routine imposes intraday stoploss based on maxdrawdown. So I assume you want to be able to pass in variable limit as a second argument to your aggregation function which only currently only takes 1 function due to the way ave works.
If putting all this in one line is not an absolute must, I can share a function I've written that generalizes aggregation via "cut variables". Here's the code:
mtapplylist2 <- function(t, IDX, DEF, MoreArgs=NULL, ...)
{
if(mode(DEF) != "list")
{
cat("Definition must be list type\n");
return(NULL);
}
a <- c();
colnames <- names(DEF);
for ( i in 1:length(DEF) )
{
def <- DEF[[i]];
func <- def[1];
if(mode(func) == "character") { func <- get(func); }
cols <- def[-1];
# build the argument to be called
arglist <- list();
arglist[[1]] <- func;
for( j in 1:length(cols) )
{
col <- cols[j];
grp <- split(t[,col], IDX);
arglist[[1+j]] <- grp;
}
arglist[["MoreArgs"]] <- MoreArgs;
v <- do.call("mapply", arglist);
# print(class(v)); print(v);
if(class(v) == "matrix")
{
a <- cbind(a, as.vector(v));
} else {
a <- cbind(a, v);
}
}
colnames(a) <- colnames;
return(a);
}
And you can use it like this:
# assuming you have the data in the data.frame
df <- data.frame(date=rep(1:10,10), ret=rnorm(100), limit=rep(c(0.25,0.50),50))
dfunc <- function(x, ...) { return(cummax(x)-x ) }
pfunc <- function(x,y, ...) { return((cummax(x)-x) < y) }
# assumes you have the function declared in the same namespace
def <- list(
"drawdown" = c("dfunc", "ret"),
"hasdrawdown" = c("pfunc", "ret", "limit")
);
# from R console
> def <- list("drawdown" = c("dfunc", "ret"),"happened" = c("pfunc","ret","limit"))
> dim( mtapplylist2(df, df$date, def) )
[1] 100 2
Notice that the "def" variable is a list containing the following items:
computed column name
vector arg function name as a string
name of the variable in the input data.frame that are inputs into the function
If you look at the guts of "mtapplylist2" function, the key components would be "split" and "mapply". These functions are sufficiently fast (I think split is implemented in C).
This works with functions requiring multiple arguments, and also for functions returning vector of the same size or aggregated value.
Try it out and let me know if this solves your problem.

Resources