Selecting Which Argument to Pass Dynamically in R - r

I'm trying to pass a specific argument dynamically to a function, where the function has default values for most or all arguments.
Here's a toy example:
library(data.table)
mydat <- data.table(evildeeds=rep(c("All","Lots","Some","None"),4),
capitalsins=rep(c("All", "Kinda","Not_really", "Virginal"),
each = 4),
hellprobability=seq(1, 0, length.out = 16))
hellraiser <- function(arg1 = "All", arg2= "All "){
mydat[(evildeeds %in% arg1) & (capitalsins %in% arg2), hellprobability]}
hellraiser()
hellraiser(arg1 = "Some")
whicharg = "arg1"
whichval = "Some"
#Could not get this to work:
hellraiser(eval(paste0(whicharg, '=', whichval)))
I would love a way to specify dynamically which argument I'm calling: In other words, get the same result as hellraiser(arg1="Some") but while picking whether to send arg1 OR arg2 dynamically. The goal is to be able to call the function with only one parameter specified, and specify it dynamically.

You could use some form of do.call like
do.call("hellraiser", setNames(list(whichval), whicharg))
but really this just seems like a bad way to handle arguments for your functions. It might be better to treat your parameters like a list that you can more easily manipulate. Here's a version that allows you to choose values where the argument names are treated like column names
hellraiser2 <- function(..., .dots=list()) {
dots <- c(.dots, list(...))
expr <- lapply(names(dots), function(x) bquote(.(as.name(x)) %in% .(dots[[x]])))
expr <- Reduce(function(a,b) bquote(.(a) & .(b)), expr)
eval(bquote(mydat[.(expr), hellprobability]))
}
hellraiser2(evildeeds="Some", capitalsins=c("Kinda","Not_really"))
hellraiser2(.dots=list(evildeeds="Some", capitalsins=c("Kinda","Not_really")))
This use of ... and .dots= syntax is borrowed from the dplyr standard evaluation functions.

I managed to get the result with
hellraiser(eval(parse(text=paste(whicharg, ' = \"', whichval, '\"', sep=''))))

Related

Is there a way to pass arguments with logical operators (!=, >,<) to a function?

create_c <- function(df, line_number = NA, prior_trt, line_name, biomarker, ...) {
if (!"data.frame" %in% class(df)) {
stop("First input must be dataframe")
}
# handle extra arguments
args <- enquos(...)
names(args) <- tolower(names(args))
# check for unknown argument - cols that do not exist in df
check_args_exist(df, args)
# argument to expression
ex_args <- unname(imap(args, function(expr, name) quo(!!sym(name) == !!expr)))
# special case arguments
if (!missing(line_number)) {
df <- df %>% filter(line_number %in% (!!line_number))
if (!missing(prior_trt)) {
df <- filter_arg(df. = df, arg = prior_trt, col = "prior_trt_", val = "y")
}
}
if (!missing(biomarker)) {
df <- filter_arg(df. = df, arg = biomarker, col = "has_", val = "positive")
}
if (!missing(line_name)) {
ln <- list()
if (!!str_detect(line_name[1], "or")) {
line_name <- str_split(line_name, " or ", simplify = TRUE)
}
for (i in 1:length(line_name)) {
ln[[i]] <- paste(tolower(sort(strsplit(line_name[i], "\\+")[[1]])), collapse = ",")
}
df <- df %>% filter(line_name %in% (ln))
}
df <- df %>%
group_by(patient_id) %>%
slice(which.min(line_number)) %>%
ungroup()
df <- df %>% filter(!!!ex_args)
invisible(df)
}
I have this function where I am basically filtering various columns based on parameters users pass. I want the users to be able to pass logical operators like >,<, != for some of the parameters. Right now my function is not able to handle any other operators besides '='. Is there a way to accomplish this?
create_c(df = bsl_all_nsclc,
line_number > 2)
create_c(df, biomarker != "positive)
Error in tolower(arg) : object 'biomarker' not found
Certainly there is a way: operators are regular functions in R, you can pass them around like any other function.
The only complication is that the operators are non-syntactic names so you can’t just pass them “as is”, this would confuse the parser. Instead, you need to wrap them in backticks, to make their use syntactically valid where a name would be expected:
filter_something = function (value, op) {
op(value, 13)
}
filter_something(cars$speed, `>`)
filter_something(cars$speed, `<`)
filter_something(cars$speed, `==`)
And since R also supports non-standard evaluation of function arguments, you can also pass unevaluated expressions — this gets slightly more complicated, since you’d want to evaluate them in the correct context. ‘rlang’/‘dplyr’ uses data masking for this.
How exactly you need to apply this depends entirely on the context in which the expression is to be used. In many cases, you can simply dispatch them to the corresponding ‘dplyr’ functions, e.g.
filter_something2 = function (.data, expr) {
.data %>%
filter({{expr}})
}
filter_something2(cars, speed < 13)
The “secret sauce” here is the {{…}} syntax. This works because filter from ‘dplyr’ accepts unevaluated arguments and handles {{expr}} specially by transforming it into (effectively) !! enexpr(expr). That is: expr is first “defused”: it is explicitly marked as unevaluated, and the name expr is replaced by the unevaluated expression it binds to (speed < 13 in the above). Next, this unevaluated expression is unquoted. That is, the wrapper is “peeled off” from the expression, and that unevaluated expression itself is handled inside filter as if it were passed as filter(.data, speed < 13). In other words: the name expr is substituted with the speed < 13 in the call expression.
For a more thorough explanation, please refer to the Programming with dplyr vignette.

What is the best way to pass a list of functions as argument?

I am looking for advice on the best way to code passing a list of functions as argument.
What I want to do:
I would like to pass as an argument a list a functions to apply them to a specific input. And give output name based on those.
A "reproducible" example
input = 1:5
Functions to pass are mean, min
Expected call:
foo(input, something_i_ask_help_for)
Expected output:
list(mean = 3, min = 1)
If it's not perfectly clear, please see my two solutions to have an illustration.
Solution 1: Passing functions as arguments
foo <- function(input, funs){
# Initialize output
output = list()
# Compute output
for (fun_name in names(funs)){
# For each function I calculate it and store it in output
output[fun_name] = funs[[fun_name]](input)
}
return(output)
}
foo(1:5, list(mean=mean, min=min))
What I don't like with this method is that we can't call it by doing: foo(1:5, list(mean, min)).
Solution 2: passing functions names as argument and using get
foo2 <- function(input, funs){
# Initialize output
output = list()
# Compute output
for (fun in funs){
# For each function I calculate it and store it in output
output[fun] = get(fun)(input)
}
return(output)
}
foo2(1:5, c("mean", "min"))
What i don't like with this method is that we are not really passing the function-object as argument.
My question:
Both ways works, but I not quite sure which one to choose.
Could you help me by:
Telling me which one is the best?
What are the advantages and defaults of each methods?
Is there a another (better) method
If you need any more information, don't hesitate to ask.
Thanks!
Simplifying solutions in question
The first of the solutions in the question requires that the list be named and the second requires that the functions have names which are passed as character strings. Those two user interfaces could be implemented using the following simplifications. Note that we add an envir argument to foo2 to ensure function name lookup occurs as expected. Of those the first seems cleaner but if the functions were to be used interactively and less typing were desired then the second does do away with having to specify the names.
foo1 <- function(input, funs) Map(function(f) f(input), funs)
foo1(1:5, list(min = min, max = max)) # test
foo2 <- function(input, nms, envir = parent.frame()) {
Map(function(nm) do.call(nm, list(input), envir = envir), setNames(nms, nms))
}
foo2(1:5, list("min", "max")) # test
Alternately we could build foo2 on foo1:
foo2a <- function(input, funs, envir = parent.frame()) {
foo1(input, mget(unlist(funs), envir = envir, inherit = TRUE))
}
foo2a(1:5, list("min", "max")) # test
or base the user interface on passing a formula containing the function names since formulas already incorporate the notion of environment:
foo2b <- function(input, fo) foo2(input, all.vars(fo), envir = environment(fo))
foo2b(1:5, ~ min + max) # test
Optional names while passing function itself
However, the question indicates that it is preferred that
the functions themselves be passed
names are optional
To incorporate those features the following allows the list to have names or not or a mixture. If a list element does not have a name then the expression defining the function (usually its name) is used.
We can derive the names from the list's names or when a name is missing we can use the function name itself or if the function is anonymous and so given as its definition then the name can be the expression defining the function.
The key is to use match.call and pick it apart. We ensure that funs is a list in case it is specified as a character vector. match.fun will interpret functions and character strings naming functions and look them up in the parent frame so we use a for loop instead of Map or lapply in order that we not generate a new function scope.
foo3 <- function(input, funs) {
cl <- match.call()[[3]][-1]
nms <- names(cl)
if (is.null(nms)) nms <- as.character(cl)
else nms[nms == ""] <- as.character(cl)[nms == ""]
funs <- as.list(funs)
for(i in seq_along(funs)) funs[[i]] <- match.fun(funs[[i]])(input)
setNames(funs, nms)
}
foo3(1:5, list(mean = mean, min = "min", sd, function(x) x^2))
giving:
$mean
[1] 3
$min
[1] 1
$sd
[1] 1.581139
$`function(x) x^2`
[1] 1 4 9 16 25
One thing that you are missing is replacing the for loops with lapply. Also for functional programming it is often good practice to separate functions to do one thing. I personally like the version from solution 1 where you pass the functions in directly because it avoids another call in R and therefore is more efficient. In solution 2, it is best to use match.fun instead of get. match.fun is stricter than get in searching for functions.
x <- 1:5
foo <- function(input, funs) {
lapply(funs, function(fun) fun(input))
}
foo(x, c(mean=mean, min=min))
The above code simplifies your solution 1. To add to this function, you could add some error handling such as is.numeric for x and is.function for funs.

What does builtins(internal = TRUE) return?

From ?builtins:
builtins(TRUE) returns an unsorted list of the names of internal functions, that is those which can be accessed as .Internal(foo(args ...)) for foo in the list.
I don't understand which functions are being returned.
I thought it would be all the closure functions in the base package that call .Internal().
However, the two sets don't match up.
base_objects <- mget(
ls(baseenv(), all.names = TRUE),
envir = baseenv()
)
internals <- names(
Filter(
assertive.types::is_internal_function,
base_objects
)
)
builtins_true <- builtins(internal = TRUE)
c(
both = length(intersect(internals, builtins_true)),
internals_not_builtins_true = length(setdiff(internals, builtins_true)),
builtins_true_not_internals = length(setdiff(builtins_true, internals))
)
## both internals_not_builtins_true builtins_true_not_internals
## 288 125 226
I also thought that it might be the values listed in src/main/names.c in R's source code, and there definitely seems to be some overlap with this, but it isn't exactly this list of values.
What is builtins() doing when you pass internal = TRUE?
Stibu's comment is a specific example of the general problem. ?builtins says that it fetches the names of the objects it returns directly from the symbol table (this is the C symbol table).
And builtins(TRUE) returns all the built-in objects callable via .Internal. That, however, doesn't mean there must be any function that calls .Internal(foo(args, ...)) for any foo.
Stibu gave one example: the internal function may not be called by an R function with the same name, as is the case for many generic functions where the default method calls .Internal.
Another example is something like .addCondHands and .addRestart, which are called by withCallingHandlers and withRestarts, respectively.
It's also possible that one R function calls multiple .Internal functions. I don't know of an example of that off the top of my head though.
After more digging, it seems that the list of functions is everything in the R_FunTab[] object in src/main/names.c where the second digit of the eval column is 1.
Here's a script to retrieve them.
library(stringi)
library(magrittr)
library(dplyr)
names.c <- readLines("https://raw.githubusercontent.com/wch/r-source/56a1b08b7282c5488acb71ee244098f4fd94f7c7/src/main/names.c")
fun_tab <- names.c[92:974] %>%
stri_replace_all_regex("^\\{", "") %>%
stri_replace_all_fixed("{PP", "PP") %>%
stri_replace_all_fixed("}},", "") %>%
stri_replace_all_fixed("\\t", "")
funs <- read.csv(text = fun_tab, header = FALSE, comment.char = "/")
cols <- names.c[86] %>%
stri_sub(4) %>%
stri_split_regex("\\t+") %>%
extract2(1) %>%
stri_trim()
colnames(funs) <- cols
funs$eval <- formatC(funs$eval, width = 3, flag = "0")
# Internal fns have 2nd digit of eval col == 1. See names.c[62:71]
internals <- funs %>% filter_(~ substring(eval, 2, 2) == 1)
I see slight differences when examining
setdiff(internals$printname, builtins(TRUE))
setdiff(builtins(TRUE), internals$printname)
For example builtins(TRUE) doesn't include shell.exec() if you aren't running Windows; mem.limits() was only recently removed from the devel branch of R, so it shows up in builtins(TRUE) for the current release version of R.

Make list content available in a function environment

In order to avoid creating R functions with many arguments defining settings for a single object, I'm gathering them in a list,
list_my_obj <- list("var1" = ..., "var2" = ..., ..., "varN" = ...)
class(list_my_obj) <- "my_obj"
I then define functions that accept such a list as argument and inject the elements of the list in the function scope:
my_fun <- function(list_my_obj) {
stopifnot(class(list_my_obj) == "my_obj")
list2env(list_my_obj, envir=environment())
rm(list_my_obj)
var_sum <- var1 + var2
(...)
}
Injecting the elements of the list in the function scope allows me to avoid calling them with list_my_obj$var1, list_my_obj$var2, etc, later in the function, which would reduce the readability of the code.
This solution works perfectly fine, however it produces a note when running R CMD check, saying "no visible binding for global variable" for var1, var2, ... varN.
To avoid such notes, one could just create new variables at the beginning of the function body "by hand" for each element of the list:
var1 <- list_my_obj$var1
(...)
varN <- list_my_obj$varN
but I would like to avoid this because N can be large.
Any better solution or idea on how to suppress the R CMD check notes in this case?
Thank you!
The function list2env is made for this, for example:
list2env(list_my_obj, env = environment())
Try with (or within):
f <- function(x) {
stopifnot(inherits(x, "my_obj"))
with(x, {
# ...
var_sum <- var1 + var2
# ...
var_sum
})
}
my_obj <- structure(list(var1 = 1, var2 = 2), class = "my_obj")
f(my_obj)

Not fully understanding how SE works across the dplyr verbs

I'm trying to understand how SE works in dplyr so I can use variables as inputs to these functions. I'm having some trouble with understanding how this works across the different functions and when I should be doing what. It would be really good to understand the logic behind this.
Here are some examples:
library(dplyr)
library(lazyeval)
a <- c("x", "y", "z")
b <- c(1,2,3)
c <- c(7,8,9)
df <- data.frame(a, b, c)
The following is exactly why i'd use SE and the *_ variant of a function. I want to change the name of what's being mutated based on another variable.
#Normal mutate - copies b into a column called new
mutate(df, new = b)
#Mutate using a variable column names. Use mutate_ and the unqouted variable name. Doesn't use the name "new", but use the string "col.new"
col.name <- "new"
mutate_(df, col.name = "b")
#Do I need to use interp? Doesn't work
expr <- interp(~(val = b), val = col.name)
mutate_(df, expr)
Now I want to filter in the same way. Not sure why my first attempt didn't work.
#Apply the same logic to filter_. the following doesn't return a result
val.to.filter <- "z"
filter_(df, "a" == val.to.filter)
#Do I need to use interp? Works. What's the difference compared to the above?
expr <- interp(~(a == val), val = val.to.filter)
filter_(df, expr)
Now I try to select_. Works as expected
#Apply the same logic to select_, an unqouted variable name works fine
col.to.select <- "b"
select_(df, col.to.select)
Now I move on to rename_. Knowing what worked for mutate and knowing that I had to use interp for filter, I try the following
#Now let's try to rename. Qouted constant, unqouted variable. Doesn't work
new.name <- "NEW"
rename_(df, "a" = new.name)
#Do I need an eval here? It worked for the filter so it's worth a try. Doesn't work 'Error: All arguments to rename must be named.'
expr <- interp(~(a == val), val = new.name)
rename_(df, expr)
Any tips on best practice when it comes to using variable names across the dplyr functions and when interp is required would be great.
The differences here are not related to which dplyr verb you are using. They are related to where you are trying to use the variable. You are mixing whether the variable is used as a function argument or not, and whether it should be interpreted as a name or as a character string.
Scenario 1:
You want to use your variable as an argument name. Such as in your mutate example.
mutate(df, new = b)
Here new is the name of a function argument, it is left of a =. The only way to do this is to use the .dots argument. Like
col.name <- 'new'
mutate_(df, .dots = setNames(list(~b), col.name))
Running just setNames(list(~b), col.name) shows you how we have an expression (~b), which is going right of the =, and the name is going left of the =.
Scenario 2:
You want to give only a variable as a function argument. This is the simplest case. Let's again use mutate(df, new = b), but in this case we want b to be variable. We could use:
v <- 'b'
mutate_(df, .dots = setNames(list(v), 'new'))
Or simply:
mutate_(df, new = b)
Scenario 3
You want to do some combinations of variable and fixed things. That is, your expression should only be partly variable. For this we use interp. For example, what if we would like to do something like:
mutate(df, new = b + 1)
But being able to change b?
v <- 'b'
mutate_(df, new = interp(~var + 1, var = as.name(v)))
Note that we as.name to make sure that we insert b into the expression, not 'b'.

Resources