Is there a way to write a function in which one of the arguments indicates what function to apply?
For example, if I have a function:
mf = function(data, option, level)
where I want option to tell whether to calculate the mean, median or sd of a data set?
Yes, one option is to just pass a function to option. E.g.
mf <- function(data, option) {
option <- match.fun(option)
option(data)
}
set.seed(42)
dat <- rnorm(10)
mf(dat, option = mean)
Which gives:
> set.seed(42)
> dat <- rnorm(10)
> mean(dat)
[1] 0.5472968
> mf(dat, option = mean)
[1] 0.5472968
> sd(dat)
[1] 0.8354488
> mf(dat, option = sd)
[1] 0.8354488
match.fun() is the standard R way of matching to an available function. In the example I pass the function itself, but match.fun() allows other ways of referring to a function, for example as a character string:
> mf(dat, option = "mean")
[1] 0.5472968
match.fun() returns a function that can be used as any other function, hence option() is a function that is essentially the same as the function passed to the option argument or is the function named in the option argument.
It isn't clear how the level argument was supposed to be used to I have ignored that above.
I should probably add that if you want to pass in any arguments to the applied function then you'll want to use ... in the function definition, e.g.:
mf <- function(data, option, ...) {
option <- match.fun(option)
option(data, ...)
}
Hence we can do things like this
set.seed(42)
dat2 <- rnorm(10)
dat2[4] <- NA
mean(dat2)
mean(dat2, na.rm = TRUE)
mf(dat2, mean, na.rm = TRUE)
the last three lines giving
> mean(dat2)
[1] NA
> mean(dat2, na.rm = TRUE)
[1] 0.5377895
> mf(dat2, mean, na.rm = TRUE)
[1] 0.5377895
There is a bit of a problem in that "data set" in R usually means a dataframe and there is no median.data.frame so you need to use both lapply and do.call:
df <- data.frame(x=rnorm(10), y=rnorm(10))
mf = function(data, option="mean") {lapply( data,
function(col) do.call(option, list(col))) }
mf(df)
#-------------
$x
[1] 0.01646814
$y
[1] 0.5388518
You did not indicate what "level" was supposed to do, so I left it out of the equation,
> mf(df, sd)
$x
[1] 1.169847
$y
[1] 0.8907117
Related
I'm trying to write a parent function that calls a bunch of sub-functions that all have pretty sensible defaults and are well documented. Based on the value of one parameter, there are potentially different arguments I'd like to pass down to different sub-functions for customization. Is there a way to pass arguments to multiple functions using elipsis or another strategy?
Here's a simple example; here the challenge is being able to pass na.rm and/or base when a user wants, but otherwise use the existing defaults:
dat <- c(NA, 1:5)
# I want a flexible function that uses sensible defaults unless otherwise specified
meanLog<-function(x, ...){
y <- log(x, ...)
z <- mean(y, ...)
return(z)
}
# I know I can pass ... to one function wrapped inside this one.
justLog <- function(x, ...){
log(x, ...)
}
justLog(dat)
justLog(dat, base = 2)
# or another
justMean <- function(x, ...){
mean(x, ...)
}
justMean(dat)
justMean(dat, na.rm =T)
# but I can't pass both arguments
meanLog(dat) # works fine, but I want to customize a few things
meanLog(dat, na.rm =T, base = 2)
justMean(dat, base =2)
# In this case that is because justLog breaks if it gets an unused na.rm
justLog(dat, na.rm =T)
1) Define do.call2 which is like do.call except that it accepts unnamed arguments as well as named argument in the character vector accepted which defaults to the formals in the function.
Note that the arguments of mean do not include na.rm -- it is slurped up by the dot dot dot argument -- but the mean.default method does. Also primitive functions do not have formals so the accepted argument must be specified explicitly for those rather than defaulted.
do.call2 <- function(what, args, accepted = formalArgs(what)) {
ok <- names(args) %in% c("", accepted)
do.call(what, args[ok])
}
# test
dat <- c(NA, 1:5)
meanLog <- function(x, ...){
y <- do.call2("log", list(x, ...), "base")
z <- do.call2("mean.default", list(y, ...))
return(z)
}
meanLog(dat, na.rm = TRUE, base = 2)
## [1] 1.381378
# check
mean(log(dat, base = 2), na.rm = TRUE)
## [1] 1.381378
2) Another possibility is to provide separate arguments for mean and log.
(A variation of that is to use dot dot dot for one of the functions and argument lists for the others. For example nls in R uses dot dot dot but also uses a control argument to specify other arguments.)
# test
dat <- c(NA, 1:5)
meanLog <- function(x, logArgs = list(), meanArgs = list()) {
y <- do.call("log", c(list(x), logArgs))
z <- do.call("mean", c(list(y), meanArgs))
return(z)
}
meanLog(dat, logArgs = list(base = 2), meanArgs = list(na.rm = TRUE))
## [1] 1.381378
# check
mean(log(dat, base = 2), na.rm = TRUE)
## [1] 1.381378
I wrote a wrapper around ftable because I need to compute flat tables with frequency and percentage for many variables. As ftable method for class "formula" uses non-standard evaluation, the wrapper relies on do.call and match.call to allow the use of the subset argument of ftable (more details in my previous question).
mytable <- function(...) {
do.call(what = ftable,
args = as.list(x = match.call()[-1]))
# etc
}
However, I cannot use this wrapper with lapply nor with:
# example 1: error with "lapply"
lapply(X = warpbreaks[c("breaks",
"wool",
"tension")],
FUN = mytable,
row.vars = 1)
Error in (function (x, ...) : object 'X' not found
# example 2: error with "with"
with(data = warpbreaks[warpbreaks$tension == "L", ],
expr = mytable(wool))
Error in (function (x, ...) : object 'wool' not found
These errors seem to be due to match.call not being evaluated in the right environment.
As this question is closely linked to my previous one, here is a sum up of my problems:
The wrapper with do.call and match.call cannot be used with lapply or with.
The wrapper without do.call and match.call cannot use the subset argument of ftable.
And a sum up of my questions:
How can I write a wrapper which allows both to use the subset argument of ftable and to be used with lapply and with? I have ideas to avoid the use of lapply and with, but I am looking to understand and correct these errors to improve my knowledge of R.
Is the error with lapply related to the following note from ?lapply?
For historical reasons, the calls created by lapply are unevaluated,
and code has been written (e.g., bquote) that relies on this. This
means that the recorded call is always of the form FUN(X[[i]], ...),
with i replaced by the current (integer or double) index. This is not
normally a problem, but it can be if FUN uses sys.call or match.call
or if it is a primitive function that makes use of the call. This
means that it is often safer to call primitive functions with a
wrapper, so that e.g. lapply(ll, function(x) is.numeric(x)) is
required to ensure that method dispatch for is.numeric occurs
correctly.
The problem with using match.call with lapply is that match.call returns the literal call that passed into it, without any interpretation. To see what's going on, let's make a simpler function which shows exactly how your function is interpreting the arguments passed into it:
match_call_fun <- function(...) {
call = as.list(match.call()[-1])
print(call)
}
When we call it directly, match.call correctly gets the arguments and puts them in a list that we can use with do.call:
match_call_fun(iris['Species'], 9)
[[1]]
iris["Species"]
[[2]]
[1] 9
But watch what happens when we use lapply (I've only included the output of the internal print statement):
lapply('Species', function(x) match_call_fun(iris[x], 9))
[[1]]
iris[x]
[[2]]
[1] 9
Since match.call gets the literal arguments passed to it, it receives iris[x], not the properly interpreted iris['Species'] that we want. When we pass those arguments into ftable with do.call, it looks for an object x in the current environment, and then returns an error when it can't find it. We need to interpret
As you've seen, adding envir = parent.frame() fixes the problem. This is because, adding that argument tells do.call to evaluate iris[x] in the parent frame, which is the anonymous function in lapply where x has it's proper meaning. To see this in action, let's make another simple function that uses do.call to print ls from 3 different environmental levels:
z <- function(...) {
print(do.call(ls, list()))
print(do.call(ls, list(), envir = parent.frame()))
print(do.call(ls, list(), envir = parent.frame(2)))
}
When we call z() from the global environment, we see the empty environment inside the function, then the Global Environment:
z()
character(0) # Interior function environment
[1] "match_call_fun" "y" "z" # GlobalEnv
[1] "match_call_fun" "y" "z" # GlobalEnv
But when we call from within lapply, we see that one level of parent.frame up is the anonymous function in lapply:
lapply(1, z)
character(0) # Interior function environment
[1] "FUN" "i" "X" # lapply
[1] "match_call_fun" "y" "z" # GlobalEnv
So, by adding envir = parent.frame(), do.call knows to evaluate iris[x] in the lapply environment where it knows that x is actually 'Species', and it evaluates correctly.
mytable_envir <- function(...) {
tab <- do.call(what = ftable,
args = as.list(match.call()[-1]),
envir = parent.frame())
prop <- prop.table(x = tab,
margin = 2) * 100
bind <- cbind(as.matrix(x = tab),
as.matrix(x = prop))
margin <- addmargins(A = bind,
margin = 1)
round(x = margin,
digits = 1)
}
# This works!
lapply(X = c("breaks","wool","tension"),
FUN = function(x) mytable_envir(warpbreaks[x],row.vars = 1))
As for why adding envir = parent.frame() makes a difference since that appears to be the default option. I'm not 100% sure, but my guess is that when the default argument is used, parent.frame is evaluated inside the do.call function, returning the environment in which do.call is run. What we're doing, however, is calling parent.frame outside do.call, which means it returns one level higher than the default version.
Here's a test function that takes parent.frame() as a default value:
fun <- function(y=parent.frame()) {
print(y)
print(parent.frame())
print(parent.frame(2))
print(parent.frame(3))
}
Now look at what happens when we call it from within lapply both with and without passing in parent.frame() as an argument:
lapply(1, function(y) fun())
<environment: 0x12c5bc1b0> # y argument
<environment: 0x12c5bc1b0> # parent.frame called inside
<environment: 0x12c5bc760> # 1 level up = lapply
<environment: R_GlobalEnv> # 2 levels up = globalEnv
lapply(1, function(y) fun(y = parent.frame()))
<environment: 0x104931358> # y argument
<environment: 0x104930da8> # parent.frame called inside
<environment: 0x104931358> # 1 level up = lapply
<environment: R_GlobalEnv> # 2 levels up = globalEnv
In the first example, the value of y is the same as what you get when you call parent.frame() inside the function. In the second example, the value of y is the same as the environment one level up (inside lapply). So, while they look the same, they're actually doing different things: in the first example, parent.frame is being evaluated inside the function when it sees that there is no y= argument, in the second, parent.frame is evaluated in the lapply anonymous function first, before calling fun, and then is passed into it.
As you only want to pass all the arguments passed to ftable u do not need the do.call().
mytable <- function(...) {
tab <- ftable(...)
prop <- prop.table(x = tab,
margin = 2) * 100
bind <- cbind(as.matrix(x = tab),
as.matrix(x = prop))
margin <- addmargins(A = bind,
margin = 1)
return(round(x = margin,
digits = 1))
}
The following lapply creates a table for every Variable separatly i don't know if that is what you want.
lapply(X = c("breaks",
"wool",
"tension"),
FUN = function(x) mytable(warpbreaks[x],
row.vars = 1))
If you want all 3 variables in 1 table
warpbreaks$newVar <- LETTERS[3:4]
lapply(X = cbind("c(\"breaks\", \"wool\", \"tension\")",
"c(\"newVar\", \"tension\",\"wool\")"),
FUN = function(X)
eval(parse(text=paste("mytable(warpbreaks[,",X,"],
row.vars = 1)")))
)
Thanks to this issue, the wrapper became:
# function 1
mytable <- function(...) {
do.call(what = ftable,
args = as.list(x = match.call()[-1]),
envir = parent.frame())
# etc
}
Or:
# function 2
mytable <- function(...) {
mc <- match.call()
mc[[1]] <- quote(expr = ftable)
eval.parent(expr = mc)
# etc
}
I can now use the subset argument of ftable, and use the wrapper in lapply:
lapply(X = warpbreaks[c("wool",
"tension")],
FUN = function(x) mytable(formula = x ~ breaks,
data = warpbreaks,
subset = breaks < 15))
However I do not understand why I have to supply envir = parent.frame() to do.call as it is a default argument.
More importantly, these methods do not resolve another issue: I can not use the subset argument of ftable with mapply.
The problem I am trying to tackle here is needing to apply (execute) an S3 object which is essentially a vector-like structure. This may contain various formulas which at some stage I need to evaluate for a single argument, in order to get back a vector-like object of the original shape, containing the evaluation of its constituent formulas at the given argument.
Examples of this (just to illustrate) might be a matrix of transformation - say rotation - which would take the angle to rotate by, and produce a matrix of values by which to multiply a point, for the given rotation. Another example might be the vector of states in a problem in classical mechanics. Then given t, v, a, etc, it could return s...
Now, I have created my container object in S3, and its working fine in most respects, using generic methods; I also found the Ops.myClass system of operator overloading very useful.
To complete my class, all I need now is a way to specify it as executable.
I see that there are various mechanisms that will do what I want in part, for instance I suppose that as.function() will convert the object to behave as I want, and something like lapply() could be used for the "reverse" application of the argument to the functions. What I am not sure how to do is link it all up so that I can do something like this mock-up:
new_Object <- function(<all my function vector stuff spec>)
vtest <- new_Object(<say, sin, cos, tan>)
vtest(1)
==>
myvec(.8414709848078965 .5403023058681398 1.557407724654902)
(Yes, I have already specified a generic print() routine that will make it appear nice)
All suggestions, sample code, links to examples are welcome.
PS =====
I have added some basic example code as per request.
I am not sure how much would be too much, so the full working minimal example, including operator overloading is in this gist here.
I am only showing the constructor and helper functions below:
# constructor
new_Struct <- function(stype , vec){
stopifnot(is.character(stype)) # enforce up | down
stopifnot(is.vector(vec))
structure(vec,class="Struct", type=stype)
}
# constructor helper functions --- need to allow for nesting!
up <-function(...){
vec <- unlist(list(...),use.names = FALSE)
new_Struct("up",vec)
}
down <-function(...){
vec <- unlist(list(...),use.names = FALSE)
new_Struct("down",vec)
}
The above code behaves thus:
> u1 <- up(1,2,3)
> u2 <- up(3,4,5)
> d1 <- down(u1)
> d1
[1] down(1, 2, 3)
> u1+u2
[1] up(4, 6, 8)
> u1+d1
Error: '+' not defined for opposite tuple types
> u1*d1
[1] 14
> u1*u2
[,1] [,2] [,3]
[1,] 3 4 5
[2,] 6 8 10
[3,] 9 12 15
> u1^2
[1] 14
> s1 <- up(sin,cos,tan)
> s1
[1] up(.Primitive("sin"), .Primitive("cos"), .Primitive("tan"))
> s1(1)
Error in s1(1) : could not find function "s1"
What I need, is for it to be able to do this:
> s1(1)
[1] up(.8414709848078965 .5403023058681398 1.557407724654902)
You can not call each function in a list of functions without a loop.
I'm not fully understanding all requirements, but this should give you a start:
new_Struct <- function(stype , vec){
stopifnot(is.character(stype)) # enforce up | down
stopifnot(is.vector(vec) || is.function(vec))
structure(vec,class="Struct", type=stype)
}
# constructor helper functions --- need to allow for nesting!
up <- function(...) UseMethod("up")
up.default <- function(...){
vals <- list(...)
stopifnot(all(vapply(vals, is.vector, FUN.VALUE = logical(1))))
vec <- unlist(vals, use.names = FALSE)
new_Struct("up",vec)
}
up.function <- function(...){
funs <- list(...)
stopifnot(all(vapply(funs, is.function, FUN.VALUE = logical(1))))
new_Struct("up", function(x) new_Struct("up", sapply(funs, do.call, list(x))))
}
up(1, 2, 3)
#[1] 1 2 3
#attr(,"class")
#[1] "Struct"
#attr(,"type")
#[1] "up"
up(1, 2, sin)
#Error in up.default(1, 2, sin) :
# all(vapply(vals, is.vector, FUN.VALUE = logical(1))) is not TRUE
up(sin, 1, 2)
#Error in up.function(sin, 1, 2) :
# all(vapply(funs, is.function, FUN.VALUE = logical(1))) is not TRUE
s1 <- up(sin, cos, tan)
s1(1)
#[1] 0.8414710 0.5403023 1.5574077
#attr(,"class")
#[1] "Struct"
#attr(,"type")
#[1] "up"
After some thought I have come up with a way to approach this, it's not perfect, it would be great if someone could figure out a way to make the function call implicit/transparent.
So, for now I just use the call() mechanism on the object, and that seems to work fine. Here's the pertinent part of the code, minus checks. I'll put up the latest full version on the same gist as above.
# constructor
new_Struct <- function(stype , vec){
stopifnot(is.character(stype)) # enforce up | down
stopifnot(is.vector(vec))
structure(vec,class="Struct", type=stype)
}
# constructor helper functions --- need to allow for nesting!
up <- function(...){
vec <- unlist(list(...), use.names = FALSE)
new_Struct("up",vec)
}
down <- function(...){
vec <- unlist(list(...), use.names = FALSE)
new_Struct("down",vec)
}
# generic print for tuples
print.Struct <- function(s){
outstr <- sprintf("%s(%s)", attributes(s)$type, paste(c(s), collapse=", "))
print(noquote(outstr))
}
# apply the structure - would be nice if this could be done *implicitly*
call <- function(...) UseMethod("call")
call.Struct <- function(s,x){
new_Struct(attributes(s)$type, sapply(s, do.call, list(x)))
}
Now I can do:
> s1 <- up(sin,cos,tan)
> length(s1)
[1] 3
> call(s1,1)
[1] up(0.841470984807897, 0.54030230586814, 1.5574077246549)
>
Not as nice as my ultimate target of
> s1(1)
[1] up(0.841470984807897, 0.54030230586814, 1.5574077246549)
but it will do for now...
I would like use a function that uses the standard deparse(substitute(x)) trick within lapply. Unfortunately I just get the argument of the loop back. Here's my completely useless reproducible example:
# some test data
a <- 5
b <- 6
li <- list(a1=a,b2=b)
# my test function
tf <- function(obj){
nm <- deparse(substitute(obj))
res <- list(myName=nm)
res
}
tf(a)
#returns
$myName
[1] "a"
which is fine. If I use lapply I either get [[1L]] or the x argument of an anonymous function.
lapply(li,function(x) tf(x))
# returns
$a1
$a1$myName
[1] "x"
$b2
$b2$myName
[1] "x"
Is there any way to obtain the following?
$a1
$a1$myName
[1] "a1"
$b2
$b2$myName
[1] "b1"
If there's anything more general on deparse(substitute(x)) and lapply I'd also eager to know.
EDIT:
The problem as opposed to using an anonymous function that accepts multiple arguments and can thus use the name of the object and the object itself does not work because, the tf function will only accept one argument. So this does not work here...
A possible solution :
lapply(li, function(x) {
call1 <- sys.call(1)
call1[[1]] <- as.name("names")
call1 <- call1[1:2]
nm <- eval(call1)
y <- deparse(substitute(x))
y <- gsub("\\D", "", y)
y <- as.numeric(y)
list(myname=nm[y])
})
I would like to write a function that handles multiple data types. Below is an example that works but seems clunky. Is there a standard (or better) way of doing this?
(It's times like this I miss Matlab where everything is one type :>)
myfunc = function(x) {
# does some stuff to x and returns a value
# at some point the function will need to find out the number of elements
# at some point the function will need to access an element of x.
#
# args:
# x: a column of data taking on many possible types
# e.g., vector, matrix, data.frame, timeSeries, list
x.vec <- as.vector(as.matrix(as.data.frame(x)))
n <- length(x.vec)
ret <- x.vec[n/3] # this line only for concreteness
return(ret)
}
Use S3 methods. A quick example to get you started:
myfunc <- function(x) {
UseMethod("myfunc",x)
}
myfunc.data.frame <- function(x) {
x.vec <- as.vector(as.matrix(x))
myfunc(x.vec)
}
myfunc.numeric <- function(x) {
n <- length(x)
ret <- x[n/3]
return(ret)
}
myfunc.default <- function(x) {
stop("myfunc not defined for class",class(x),"\n")
}
Two notes:
The ... syntax passes any additional arguments on to functions. If you're extending an existing S3 method (e.g. writing something like summary.myobject), then including the ... is a good idea, because you can pass along arguments conventionally given to the canonical function.
print.myclass <- function(x,...) {
print(x$keyData,...)
}
You can call functions from other functions and keep things nice and parsimonious.
Hmm, your documentation for the function is
# args:
# x: a column of data taking on many possible types
# e.g., vector, matrix, data.frame, timeSeries, list
and if one supplies an object as you claim is require, isn't it already a vector and not a matrix or a data frame, hence obviating the need for separate methods/specific handling?
> dat <- data.frame(A = 1:10, B = runif(10))
> class(dat[,1])
[1] "integer"
> is.vector(dat[,1])
[1] TRUE
> is.vector(dat$A)
[1] TRUE
> is.numeric(dat$A)
[1] TRUE
> is.data.frame(dat$A)
[1] FALSE
I would:
myfunc <- function(x) {
# args:
# x: a column of data taking on many possible types
# e.g., vector, matrix, data.frame, timeSeries, list
n <- length(x)
ret <- x[n/3] # this line only for concreteness
return(ret)
}
> myfunc(dat[,1])
[1] 3
Now, if you want to handle different types of objects and extract a column, then S3 methods would be a way to go. Perhaps your example is over simplified for actual use? Anyway, S3 methods would be something like:
myfunc <- function(x, ...)
UseMethod("myfunc", x)
myfunc.matrix <- function(x, j = 1, ...) {
x <- x[, j]
myfunc.default(x, ...)
}
myfunc.data.frame <- function(x, j = 1, ...) {
x <- data.matrix(x)
myfunc.matrix(x, j, ...)
}
myfunc.default <- function(x, ...) {
n <- length(x)
x[n/3]
}
Giving:
> myfunc(dat)
[1] 3
> myfunc(data.matrix(dat))
[1] 3
> myfunc(data.matrix(dat), j = 2)
[1] 0.2789631
> myfunc(dat[,2])
[1] 0.2789631
You probably should try to use an S3 method for writing a function that will handle multiple datatypes.
A good reference is here: http://www.biostat.jhsph.edu/~rpeng/biostat776/classes-methods.pdf