I'm working with dplyr and created code to compute new data that is plotted with ggplot.
I want to create a function with this code. It should take a name of a column of the data frame that is manipulated by dplyr. However, trying to work with columnnames does not work. Please consider the minimal example below:
df <- data.frame(A = seq(-5, 5, 1), B = seq(0,10,1))
library(dplyr)
foo <- function (x) {
df %>%
filter(x < 1)
}
foo(B)
Error in filter_impl(.data, dots(...), environment()) :
object 'B' not found
Is there any solution to use the name of a column as a function argument?
If you want to create a function which accepts the string "B" as an argument (as in you question's title)
foo_string <- function (x) {
eval(substitute(df %>% filter(xx < 1),list(xx=as.name(x))))
}
foo_string("B")
If you want to create a function which accepts captures B as an argument (as in dplyr)
foo_nse <- function (x) {
# capture the argument without evaluating it
x <- substitute(x)
eval(substitute(df %>% filter(xx < 1),list(xx=x)))
}
foo_nse(B)
You can find more information in Advanced R
Edit
dplyr makes things easier in version 0.3. Functions with suffixes "_" accept a string or an expression as an argument
foo_string <- function (x) {
# construct the string
string <- paste(x,"< 1")
# use filter_ instead of filter
df %>% filter_(string)
}
foo_string("B")
foo_nse <- function (x) {
# capture the argument without evaluating it
x <- substitute(x)
# construct the expression
expression <- lazyeval::interp(quote(xx < 1), xx = x)
# use filter_ instead of filter
df %>% filter_(expression)
}
foo_nse(B)
You can find more information in this vignette
I remember a similar question which was answered by #Richard Scriven. I think you need to write something like this.
foo <- function(x,...)filter(x,...)
What #Richard Scriven mentioned was that you need to use ... here. If you type ?dplyr, you will be able to find this: filter(.data, ...) I think you replace .data with x or whatever. If you want to pick up rows which have values smaller than 1 in B in your df, it will be like this.
foo <- function (x,...) filter(x,...)
foo(df, B < 1)
Related
create_c <- function(df, line_number = NA, prior_trt, line_name, biomarker, ...) {
if (!"data.frame" %in% class(df)) {
stop("First input must be dataframe")
}
# handle extra arguments
args <- enquos(...)
names(args) <- tolower(names(args))
# check for unknown argument - cols that do not exist in df
check_args_exist(df, args)
# argument to expression
ex_args <- unname(imap(args, function(expr, name) quo(!!sym(name) == !!expr)))
# special case arguments
if (!missing(line_number)) {
df <- df %>% filter(line_number %in% (!!line_number))
if (!missing(prior_trt)) {
df <- filter_arg(df. = df, arg = prior_trt, col = "prior_trt_", val = "y")
}
}
if (!missing(biomarker)) {
df <- filter_arg(df. = df, arg = biomarker, col = "has_", val = "positive")
}
if (!missing(line_name)) {
ln <- list()
if (!!str_detect(line_name[1], "or")) {
line_name <- str_split(line_name, " or ", simplify = TRUE)
}
for (i in 1:length(line_name)) {
ln[[i]] <- paste(tolower(sort(strsplit(line_name[i], "\\+")[[1]])), collapse = ",")
}
df <- df %>% filter(line_name %in% (ln))
}
df <- df %>%
group_by(patient_id) %>%
slice(which.min(line_number)) %>%
ungroup()
df <- df %>% filter(!!!ex_args)
invisible(df)
}
I have this function where I am basically filtering various columns based on parameters users pass. I want the users to be able to pass logical operators like >,<, != for some of the parameters. Right now my function is not able to handle any other operators besides '='. Is there a way to accomplish this?
create_c(df = bsl_all_nsclc,
line_number > 2)
create_c(df, biomarker != "positive)
Error in tolower(arg) : object 'biomarker' not found
Certainly there is a way: operators are regular functions in R, you can pass them around like any other function.
The only complication is that the operators are non-syntactic names so you can’t just pass them “as is”, this would confuse the parser. Instead, you need to wrap them in backticks, to make their use syntactically valid where a name would be expected:
filter_something = function (value, op) {
op(value, 13)
}
filter_something(cars$speed, `>`)
filter_something(cars$speed, `<`)
filter_something(cars$speed, `==`)
And since R also supports non-standard evaluation of function arguments, you can also pass unevaluated expressions — this gets slightly more complicated, since you’d want to evaluate them in the correct context. ‘rlang’/‘dplyr’ uses data masking for this.
How exactly you need to apply this depends entirely on the context in which the expression is to be used. In many cases, you can simply dispatch them to the corresponding ‘dplyr’ functions, e.g.
filter_something2 = function (.data, expr) {
.data %>%
filter({{expr}})
}
filter_something2(cars, speed < 13)
The “secret sauce” here is the {{…}} syntax. This works because filter from ‘dplyr’ accepts unevaluated arguments and handles {{expr}} specially by transforming it into (effectively) !! enexpr(expr). That is: expr is first “defused”: it is explicitly marked as unevaluated, and the name expr is replaced by the unevaluated expression it binds to (speed < 13 in the above). Next, this unevaluated expression is unquoted. That is, the wrapper is “peeled off” from the expression, and that unevaluated expression itself is handled inside filter as if it were passed as filter(.data, speed < 13). In other words: the name expr is substituted with the speed < 13 in the call expression.
For a more thorough explanation, please refer to the Programming with dplyr vignette.
SHORT SUMMARY
dplyr unquoting is failing as an argument of function summarise where the quoted object is the argument of a function the use of summarise, and that argument is assigned in a for loop.
For Loop
for(j in 1:1){
sumvar <- paste0("randnum",j)
chkfunc(sumvar)
}
Function (abbreviated here, shown in full below)
chkfunc <- function(sumvar) {
sumvar <- enquo(sumvar)
[...]
summarise(mn = mean(!!sumvar))
LONG SUMMARY
I have two columns that sometimes contain NAs and I want to use dplyr non-standard evaluation and its famous unquoting (AKA bang bang !!) to summarise each column in one for loop.
library(dplyr)
set.seed(3)
randnum1 <- rnorm(10)
randnum1[randnum1<0] <- NA
randnum2 <- rnorm(10)
randnum2[randnum2<0] <- NA
randfrm <- data.frame(cbind(randnum1, randnum2))
print(randfrm)
We see below that the filter function processes the unquoting (!!) just fine but the summarise function fails, returning an "argument is not numeric or logical" error. The same occurs when I use := in the summarise function call (not shown here), which appeared in the "Programming with dplyr" vignette. Finally, I confirmed that the class of !!sumvar is numeric within function chkfunc.
chkfunc <- function(sumvar) {
sumvar <- enquo(sumvar)
message("filter function worked with !!sumvar")
outfrm <- randfrm %>%
filter(!is.na(!!sumvar))
print(outfrm)
message("summarise function failed with !!sumvar")
outfrm <- randfrm %>%
filter(!is.na(!!sumvar)) %>%
summarise(mn = mean(!!sumvar))
}
# Just one iteration to avoid confusion
for(j in 1:1){
sumvar <- paste0("randnum",j)
chkfunc(sumvar)
}
While I would like an answer using dplyr, the following works with substitute and eval rather than using dplyr functions (answer adapted from Akrun's answer to StackOverflow question "Unquote string in R's substitute command"):
chkfunc <- function(sumvar) {
outfrm <- eval(substitute(randfrm %>%
filter(!is.na(y)) %>%
summarise(mn = mean(y)),
list(y=as.name(sumvar))))
print(outfrm)
}
for(j in 1:2){
sumvar <- paste0("randnum",j)
chkfunc(sumvar)
}
print(outfrm)
Finally, I'll note that while the pull function on !!sumvar showed the resulting class to be numeric (i.e., the same class and values of randfrm$randnum1), I figured out that !!sumvar is treated as a character string (i.e., "randnum1) in both my use of filter and summarise, hence the argument is not numeric warning.
I'm trying to understand how SE works in dplyr so I can use variables as inputs to these functions. I'm having some trouble with understanding how this works across the different functions and when I should be doing what. It would be really good to understand the logic behind this.
Here are some examples:
library(dplyr)
library(lazyeval)
a <- c("x", "y", "z")
b <- c(1,2,3)
c <- c(7,8,9)
df <- data.frame(a, b, c)
The following is exactly why i'd use SE and the *_ variant of a function. I want to change the name of what's being mutated based on another variable.
#Normal mutate - copies b into a column called new
mutate(df, new = b)
#Mutate using a variable column names. Use mutate_ and the unqouted variable name. Doesn't use the name "new", but use the string "col.new"
col.name <- "new"
mutate_(df, col.name = "b")
#Do I need to use interp? Doesn't work
expr <- interp(~(val = b), val = col.name)
mutate_(df, expr)
Now I want to filter in the same way. Not sure why my first attempt didn't work.
#Apply the same logic to filter_. the following doesn't return a result
val.to.filter <- "z"
filter_(df, "a" == val.to.filter)
#Do I need to use interp? Works. What's the difference compared to the above?
expr <- interp(~(a == val), val = val.to.filter)
filter_(df, expr)
Now I try to select_. Works as expected
#Apply the same logic to select_, an unqouted variable name works fine
col.to.select <- "b"
select_(df, col.to.select)
Now I move on to rename_. Knowing what worked for mutate and knowing that I had to use interp for filter, I try the following
#Now let's try to rename. Qouted constant, unqouted variable. Doesn't work
new.name <- "NEW"
rename_(df, "a" = new.name)
#Do I need an eval here? It worked for the filter so it's worth a try. Doesn't work 'Error: All arguments to rename must be named.'
expr <- interp(~(a == val), val = new.name)
rename_(df, expr)
Any tips on best practice when it comes to using variable names across the dplyr functions and when interp is required would be great.
The differences here are not related to which dplyr verb you are using. They are related to where you are trying to use the variable. You are mixing whether the variable is used as a function argument or not, and whether it should be interpreted as a name or as a character string.
Scenario 1:
You want to use your variable as an argument name. Such as in your mutate example.
mutate(df, new = b)
Here new is the name of a function argument, it is left of a =. The only way to do this is to use the .dots argument. Like
col.name <- 'new'
mutate_(df, .dots = setNames(list(~b), col.name))
Running just setNames(list(~b), col.name) shows you how we have an expression (~b), which is going right of the =, and the name is going left of the =.
Scenario 2:
You want to give only a variable as a function argument. This is the simplest case. Let's again use mutate(df, new = b), but in this case we want b to be variable. We could use:
v <- 'b'
mutate_(df, .dots = setNames(list(v), 'new'))
Or simply:
mutate_(df, new = b)
Scenario 3
You want to do some combinations of variable and fixed things. That is, your expression should only be partly variable. For this we use interp. For example, what if we would like to do something like:
mutate(df, new = b + 1)
But being able to change b?
v <- 'b'
mutate_(df, new = interp(~var + 1, var = as.name(v)))
Note that we as.name to make sure that we insert b into the expression, not 'b'.
Let's say I have a (dplyr/tibble) data-frame/tbl constructed like so:
df <- data_frame(x = 1:10)
Now, I'd like to use this within a function that works with df via some dplyr verbs, like so:
myfun <- function(df, x) {
x <- doSomeStuffTo(x)
filter(df, x == x)
}
But this will always return the full df... I'm trying to figure out a way to implement scoping within a dplyr verb, something like:
filter_(df, ~x == x)
... which doesn't work, either. In some other languages, you might be able to achieve this via something like:
df.filter(this.x == x)
... where this refers to the df instance.
My only work-around so far is naming the function's variable like so:
myfun <- function(df, query_x) {
query_x <- doSomeStuffTo(query_x)
filter(df, x == query_x)
}
I suspect this is doable (without using a name like query_x) somehow with SE dplyr verbs (e.g. filter_), but I haven't stumbled upon the correct pattern yet. Anyone here have the answer?
To dynamically build different dplyr commands you typically use the standard evaluation versions of the functions (the ones with the underscores) and the lazyeval package. Here's how you could change your function
doSomeStuffTo <- function(x) {x+1}
myfun <- function(df, x) {
x <- doSomeStuffTo(x)
filter_(df, lazyeval::interp(~x == y, y=x))
}
df <- data_frame(x = 1:10)
myfun(df,3)
but even in the interp we can't have x==x because it's not clear which x you want to replace. Both filter(df, 3==x) and filter(df, x==3) work with dplyr. You can have constants or column names on either side of the equality.
If you use filter_ you can pass logical expressions via quote:
myfun <- function(df, t) {
df$x <- 5*df$x
filter_(df, t )
}
> myfun(df, t= quote(x < 25) )
# A tibble: 4 x 1
x
<dbl>
1 5
2 10
3 15
4 20
I stumbled into the same issue. Instead of wrangling with even more complex evaluations, it's usually easier to just rename the function argument. Like this:
myfun <- function(df, x) {
x_ <- doSomeStuffTo(x)
filter(df, x == x_)
}
This solution is still dangerous because we might hit another variable called x_. One can be defensive about this by checking the variable names in df and making sure to pick one that isn't there. Or more lazily, one can use very implausible variable names. I often use stuff like _____temp.
Maybe the new dplyr 0.6.0 evaluation system will handle this better. See the notes about the new system, tidyeval.
Thanks in advance, and sorry if this question has been answered previously - I have looked pretty extensively. I have a dataset containing a row of with concatenated information, specifically: name,color code,some function expression. For example, one value may be:
cost#FF0033#log(x)+6.
I have all of the code to extract the information, and I end up with a vector of expressions that I would like to convert to a list of actual functions.
For example:
func.list <- list()
test.func <- c("x","x+1","x+2","x+3","x+4")
where test.func is the vector of expressions. What I would like is:
func.list[[3]]
To be equivalent to
function(x){x+3}
I know that I can create a function using:
somefunc <- function(x){eval(parse(text="x+1"))}
to convert a character value into a function. The problem comes when I try and loop through to make multiple functions. For an example of something I tried that didn't work:
for(i in 1:length(test.func)){
temp <- test.func[i]
f <- assign(function(x){eval(expr=parse(text=temp))})
func.list[[i]] <- f
}
Based on another post (http://stats.stackexchange.com/questions/3836/how-to-create-a-vector-of-functions) I also tried this:
makefunc <- function(y){y;function(x){y}}
for(i in 1:length(test.func)){
func.list[[i]] <- assign(x=paste("f",i,sep=""),value=makefunc(eval(parse(text=test.func[i]))))
}
Which gives the following error: Error in eval(expr, envir, enclos) : object 'x' not found
The eventual goal is to take the list of functions and apply the jth function to the jth column of the data.frame, so that the user of the script can specify how to normalize each column within the concatenated information given by the column header.
Maybe initialize your list with a single generic function, and then update them using:
foo <- function(x){x+3}
> body(foo) <- quote(x+4)
> foo
function (x)
x + 4
More specifically, starting from a character, you'd probably do something like:
body(foo) <- parse(text = "x+5")
Just to add onto joran's answer, this is what finally worked:
test.data <- matrix(data=rep(1,25),5,5)
test.data <- data.frame(test.data)
test.func <- c("x","x+1","x+2","x+3","x+4")
func.list <- list()
for(i in 1:length(test.func)){
func.list[[i]] <- function(x){}
body(func.list[[i]]) <- parse(text=test.func[i])
}
processed <- mapply(do.call,func.list,lapply(test.data,list))
Thanks again, joran.
This is what I do:
f <- list(identity="x",plus1 = "x+1", square= "x^2")
funCreator <- function(snippet){
txt <- snippet
function(x){
exprs <- parse(text = txt)
eval(exprs)
}
}
listOfFunctions <- lapply(setNames(f,names(f)),function(x){funCreator(x)}) # I like to have some control of the names of the functions
listOfFunctions[[1]] # try to see what the actual function looks like?
library(pryr)
unenclose(listOfFunctions[[3]]) # good way to see the actual function http://adv-r.had.co.nz/Functional-programming.html
# Call your funcions
listOfFunctions[[2]](3) # 3+1 = 4
do.call(listOfFunctions[[3]],list(3)) # 3^2 = 9
attach(listOfFunctions) # you can also attach your list of functions and call them by name
square(3) # 3^2 = 9
identity(7) # 7 ## masked object identity, better detach it now!
detach(listOfFunctions)