Not fully understanding how SE works across the dplyr verbs - r

I'm trying to understand how SE works in dplyr so I can use variables as inputs to these functions. I'm having some trouble with understanding how this works across the different functions and when I should be doing what. It would be really good to understand the logic behind this.
Here are some examples:
library(dplyr)
library(lazyeval)
a <- c("x", "y", "z")
b <- c(1,2,3)
c <- c(7,8,9)
df <- data.frame(a, b, c)
The following is exactly why i'd use SE and the *_ variant of a function. I want to change the name of what's being mutated based on another variable.
#Normal mutate - copies b into a column called new
mutate(df, new = b)
#Mutate using a variable column names. Use mutate_ and the unqouted variable name. Doesn't use the name "new", but use the string "col.new"
col.name <- "new"
mutate_(df, col.name = "b")
#Do I need to use interp? Doesn't work
expr <- interp(~(val = b), val = col.name)
mutate_(df, expr)
Now I want to filter in the same way. Not sure why my first attempt didn't work.
#Apply the same logic to filter_. the following doesn't return a result
val.to.filter <- "z"
filter_(df, "a" == val.to.filter)
#Do I need to use interp? Works. What's the difference compared to the above?
expr <- interp(~(a == val), val = val.to.filter)
filter_(df, expr)
Now I try to select_. Works as expected
#Apply the same logic to select_, an unqouted variable name works fine
col.to.select <- "b"
select_(df, col.to.select)
Now I move on to rename_. Knowing what worked for mutate and knowing that I had to use interp for filter, I try the following
#Now let's try to rename. Qouted constant, unqouted variable. Doesn't work
new.name <- "NEW"
rename_(df, "a" = new.name)
#Do I need an eval here? It worked for the filter so it's worth a try. Doesn't work 'Error: All arguments to rename must be named.'
expr <- interp(~(a == val), val = new.name)
rename_(df, expr)
Any tips on best practice when it comes to using variable names across the dplyr functions and when interp is required would be great.

The differences here are not related to which dplyr verb you are using. They are related to where you are trying to use the variable. You are mixing whether the variable is used as a function argument or not, and whether it should be interpreted as a name or as a character string.
Scenario 1:
You want to use your variable as an argument name. Such as in your mutate example.
mutate(df, new = b)
Here new is the name of a function argument, it is left of a =. The only way to do this is to use the .dots argument. Like
col.name <- 'new'
mutate_(df, .dots = setNames(list(~b), col.name))
Running just setNames(list(~b), col.name) shows you how we have an expression (~b), which is going right of the =, and the name is going left of the =.
Scenario 2:
You want to give only a variable as a function argument. This is the simplest case. Let's again use mutate(df, new = b), but in this case we want b to be variable. We could use:
v <- 'b'
mutate_(df, .dots = setNames(list(v), 'new'))
Or simply:
mutate_(df, new = b)
Scenario 3
You want to do some combinations of variable and fixed things. That is, your expression should only be partly variable. For this we use interp. For example, what if we would like to do something like:
mutate(df, new = b + 1)
But being able to change b?
v <- 'b'
mutate_(df, new = interp(~var + 1, var = as.name(v)))
Note that we as.name to make sure that we insert b into the expression, not 'b'.

Related

Selecting Which Argument to Pass Dynamically in R

I'm trying to pass a specific argument dynamically to a function, where the function has default values for most or all arguments.
Here's a toy example:
library(data.table)
mydat <- data.table(evildeeds=rep(c("All","Lots","Some","None"),4),
capitalsins=rep(c("All", "Kinda","Not_really", "Virginal"),
each = 4),
hellprobability=seq(1, 0, length.out = 16))
hellraiser <- function(arg1 = "All", arg2= "All "){
mydat[(evildeeds %in% arg1) & (capitalsins %in% arg2), hellprobability]}
hellraiser()
hellraiser(arg1 = "Some")
whicharg = "arg1"
whichval = "Some"
#Could not get this to work:
hellraiser(eval(paste0(whicharg, '=', whichval)))
I would love a way to specify dynamically which argument I'm calling: In other words, get the same result as hellraiser(arg1="Some") but while picking whether to send arg1 OR arg2 dynamically. The goal is to be able to call the function with only one parameter specified, and specify it dynamically.
You could use some form of do.call like
do.call("hellraiser", setNames(list(whichval), whicharg))
but really this just seems like a bad way to handle arguments for your functions. It might be better to treat your parameters like a list that you can more easily manipulate. Here's a version that allows you to choose values where the argument names are treated like column names
hellraiser2 <- function(..., .dots=list()) {
dots <- c(.dots, list(...))
expr <- lapply(names(dots), function(x) bquote(.(as.name(x)) %in% .(dots[[x]])))
expr <- Reduce(function(a,b) bquote(.(a) & .(b)), expr)
eval(bquote(mydat[.(expr), hellprobability]))
}
hellraiser2(evildeeds="Some", capitalsins=c("Kinda","Not_really"))
hellraiser2(.dots=list(evildeeds="Some", capitalsins=c("Kinda","Not_really")))
This use of ... and .dots= syntax is borrowed from the dplyr standard evaluation functions.
I managed to get the result with
hellraiser(eval(parse(text=paste(whicharg, ' = \"', whichval, '\"', sep=''))))

Check expression argument of function

When writing functions it is important to check for the type of arguments. For example, take the following (not necessarily useful) function which is performing subsetting:
data_subset = function(data, date_col) {
if (!TRUE %in% (is.character(date_col) | is.expression(date_col))){
stop("Input variable date is of wrong format")
}
if (is.character(date_col)) {
x <- match(date_col, names(data))
} else x <- match(deparse(substitute(date_col)), names(data))
sub <- data[,x]
}
I would like to allow the user to provide the column which should be extracted as character or expression (e.g. a column called "date" vs. just date). At the beginning I would like to check that the input for date_col is really either a character value or an expression. However, 'is.expression' does not work:
Error in match(x, table, nomatch = 0L) : object '...' not found
Since deparse(substitute)) works if one provides expressions I thought 'is.expression' has to work as well.
What is wrong here, can anyone give me a hint?
I think you are not looking for is.expression but for is.name.
The tricky part is to get the type of date_col and to check if it is of type character only if it is not of type name. If you called is.character when it's a name, then it would get evaluated, typically resulting in an error because the object is not defined.
To do this, short circuit evaluation can be used: In
if(!(is.name(substitute(date_col)) || is.character(date_col)))
is.character is only called if is.name returns FALSE.
Your function boils down to:
data_subset = function(data, date_col) {
if(!(is.name(substitute(date_col)) || is.character(date_col))) {
stop("Input variable date is of wrong format")
}
date_col2 <- as.character(substitute(date_col))
return(data[, date_col2])
}
Of course, you could use if(is.name(…)) to convert only to character when date_col is a name.
This works:
testDF <- data.frame(col1 = rnorm(10), col2 = rnorm(10, mean = 10), col3 = rnorm(10, mean = 50), rnorm(10, mean = 100))
data_subset(testDF, "col1") # ok
data_subset(testDF, col1) # ok
data_subset(testDF, 1) # Error in data_subset(testDF, 1) : Input variable date is of wrong format
However, I don't think you should do this. Consider the following example:
var <- "col1"
data_subset(testDF, var) # Error in `[.data.frame`(data, , date_col2) : undefined columns selected
col1 <- "col2"
data_subset(testDF, col1) # Gives content of column 1, not column 2.
Though this "works as designed", it is confusing because unless carefully reading your function's documentation one would expect to get col1 in the first case and col2 in the second case.
Abusing a famous quote:
Some people, when confronted with a problem, think “I know, I'll use non-standard evaluation.” Now they have two problems.
Hadley Wickham in Non-standard evaluation:
Non-standard evaluation allows you to write functions that are extremely powerful. However, they are harder to understand and to program with. As well as always providing an escape hatch, carefully consider both the costs and benefits of NSE before using it in a new domain.
Unless you expect large benefits from allowing to skip the quotes around the name of the column, don't do it.

Passing an evaluated expression as named arg to function within function (R)

I would like to expand a data.frame to include a new column & give this column a dynamically assigned name passed within a function. Here is a simplified example:
passMyName <-function(df, newColTitle) {
df2 <-data.frame(df, newColTitle = rep(NA, nrow(df)))
return(df2)
}
randomDF <-data.frame(a=1:3, b=4:6, c=7:9)
passMyName(randomDF, myCustomColTitle)
This will add a new column to the data.frame with the variable name defined by the function, "newColTitle". Instead, I would like to evaluate newColTitle as "myCustomColTitle" and pass this as the named tag for this new column, such that the output data.fram contains a new column called "myCustomColTitle".
Is this possible? Perhaps via substitute(), deparse(), quote(), eval(), paste() etc. You can assume that I've read the following primer on this type of topic (several times actually):
http://adv-r.had.co.nz/Computing-on-the-language.html
But some details have not quite stuck with me.
Thanks
I think this would make much more sense if you passed the new name in as a character value. For example
passMyName <-function(df, newColTitle) {
df2 <- cbind(df, setNames(list(rep(NA, nrow(df))), newColTitle))
return(df2)
}
randomDF <-data.frame(a=1:3, b=4:6, c=7:9)
passMyName(randomDF , "myCustomColTitle")
but if you really wanted to use an unevaluated expression, you can use
passMyName <-function(df, newColTitle) {
newColTitle <- deparse(substitute(newColTitle))
df2 <- cbind(df, setNames(list(rep(NA, nrow(df))), newColTitle))
return(df2)
}
randomDF <-data.frame(a=1:3, b=4:6, c=7:9)
passMyName(randomDF, myCustomColTitle)

character string as function argument r

I'm working with dplyr and created code to compute new data that is plotted with ggplot.
I want to create a function with this code. It should take a name of a column of the data frame that is manipulated by dplyr. However, trying to work with columnnames does not work. Please consider the minimal example below:
df <- data.frame(A = seq(-5, 5, 1), B = seq(0,10,1))
library(dplyr)
foo <- function (x) {
df %>%
filter(x < 1)
}
foo(B)
Error in filter_impl(.data, dots(...), environment()) :
object 'B' not found
Is there any solution to use the name of a column as a function argument?
If you want to create a function which accepts the string "B" as an argument (as in you question's title)
foo_string <- function (x) {
eval(substitute(df %>% filter(xx < 1),list(xx=as.name(x))))
}
foo_string("B")
If you want to create a function which accepts captures B as an argument (as in dplyr)
foo_nse <- function (x) {
# capture the argument without evaluating it
x <- substitute(x)
eval(substitute(df %>% filter(xx < 1),list(xx=x)))
}
foo_nse(B)
You can find more information in Advanced R
Edit
dplyr makes things easier in version 0.3. Functions with suffixes "_" accept a string or an expression as an argument
foo_string <- function (x) {
# construct the string
string <- paste(x,"< 1")
# use filter_ instead of filter
df %>% filter_(string)
}
foo_string("B")
foo_nse <- function (x) {
# capture the argument without evaluating it
x <- substitute(x)
# construct the expression
expression <- lazyeval::interp(quote(xx < 1), xx = x)
# use filter_ instead of filter
df %>% filter_(expression)
}
foo_nse(B)
You can find more information in this vignette
I remember a similar question which was answered by #Richard Scriven. I think you need to write something like this.
foo <- function(x,...)filter(x,...)
What #Richard Scriven mentioned was that you need to use ... here. If you type ?dplyr, you will be able to find this: filter(.data, ...) I think you replace .data with x or whatever. If you want to pick up rows which have values smaller than 1 in B in your df, it will be like this.
foo <- function (x,...) filter(x,...)
foo(df, B < 1)

Passing expression through functions

I'm using data.table package and trying to write a function (shown below):
require(data.table)
# Function definition
f = function(path, key) {
table = data.table(read.delim(path, header=TRUE))
e = substitute(key)
setkey(table, e) # <- Error in setkeyv(x, cols, verbose = verbose) : some columns are not in the data.table: e
return(table)
}
# Usage
f("table.csv", ID)
Here I try to pass an expression to the function. Why this code doesn't work?
I've already tried different combinations of substitute(), quote() and eval(). So, it'd be great if you could also explain how to get this to work.
First, let's look at how the setkey function does things from the data.table package:
# setkey function
function (x, ..., verbose = getOption("datatable.verbose"))
{
if (is.character(x))
stop("x may no longer be the character name of the data.table. The possibility was undocumented and has been removed.")
cols = getdots()
if (!length(cols))
cols = colnames(x)
else if (identical(cols, "NULL"))
cols = NULL
setkeyv(x, cols, verbose = verbose)
}
So, when you do:
require(data.table)
dt <- data.table(ID=c(1,1,2,2,3), y = 1:5)
setkey(dt, ID)
It calls the function getdots which is internal to data.table (that is, it's not exported). Let's have a look at that function:
# data.table:::getdots
function ()
{
as.character(match.call(sys.function(-1), call = sys.call(-1),
expand.dots = FALSE)$...)
}
So, what does this do? It takes the parameter you entered in setkey and it uses match.call to extract the arguments separately. That is, the match.call argument for this example case would be:
setkey(x = dt, ... = list(ID))
and since it's a list, you can access the ... parameter with $... to get a list of 1 element with its value ID and converting to this list to a character with as.character results in "ID" (a character vector). And then setkey passes this to setkeyv internally to set the keys.
Now why doesn't this work when you write setkey(table, key) inside your function?
This is precisely because of the way setkey/getdots is. The setkey function is designed to take any argument after the first argument (which is a data.table) and then return the ... argument as a character.
That is, if you give setkey(dt, key) then it'll return cols <- "key". If you give setkey(dt, e), it'll give back cols <- "e". It doesn't look for if "key" is an existing variable and then if so substitute the value of the variable. All it does is convert the value you provide (whether it be a symbol or character) back to a character.
Of course this won't work in your case because you want the value in key = ID to be provided in setkey. At least I can't think of a way to do this.
How to get around this?
As #agstudy already mentions, the best/easiest way is to pass "ID" and use setkeyv. But, if you really insist on using f("table.csv", ID) then, this is what you could do:
f <- function(path, key) {
table = data.table(read.delim(path, header=TRUE))
e = as.character(match.call(f)$key)
setkeyv(table, e)
return(table)
}
Here, you first use match.call to get the value corresponding to argument key and then convert it to a character and then pass that to setkeyv.
In short, setkey internally uses setkeyv. And imho, setkey is a convenient function to be used when you already know the column name of the data.table for which you need to set the key. Hope this helps.
I can't tell from your code what you're trying to achieve, so I'll answer the question the title asks instead; "How to pass an expression through a function?"
If you want to do this (this should be avoided where possible), you can do the following:
f <- function(expression) {
return(eval(parse(text=expression)))
}
For example:
f("a <- c(1,2,3); sum(a)")
# [1] 6

Resources