I'd like to use a utility function to check whether a given column exists within a given data.frame. I'm piping within the tidyverse. The best I've come up with so far is
library(magrittr)
columnExists <- function(data, col) {
tryCatch({
rlang::as_label(rlang::enquo(col)) %in% names(data)
},
error=function(e) FALSE
)
}
This works in the global environment
> mtcars %>% columnExists(mpg)
[1] TRUE
> mtcars %>% columnExists(bad)
[1] FALSE
But not when called from within another function, which is my actual use case
outerFunction <- function(d, col) {
d %>% columnExists((col))
}
> mtcars %>% outerFunction(mpg) # Expected TRUE
[1] FALSE
> mtcars %>% outerFunction(bad) # Expected FALSE
[1] FALSE
What am I doing wrong? Is it possible to have a single function that works correctly in the global environment and also when nested in another function?
I have found several SO posts related to checking for the existence of a given column or columns, but they all seem to assume either that the column name will be passed as a string or the call to check existence is not nested (or both). That is not the case here.
You want to pass though the original symbol in your outerFunction. Use
outerFunction <- function(d, col) {
d %>% columnExists( {{col}} )
}
The "embrace" syntax will prevent early evaluation.
Related
require(magrittr)
require(purrr)
is.out.same <- function(.call, ...) {
## Checks if args in .call will produce identical output in other functions
call <- substitute(.call) # Captures function call
f_names <- eval(substitute(alist(...))) # Makes list of f_names
map2(rep(list(call), length(f_names)), # Creates list of new function calls
f_names,
function(.x, .y, i) {.x[[1]] <- .y; return(.x)}
) %>%
map(eval) %>% # Evaluates function calls
map_lgl(identical, x = .call) %>% # Checks output of new calls against output of original call
all() # Returns TRUE if calls produce identical outputs
}
is.out.same(map(1:3, cumsum), lapply) # This works
map(1:3, cumsum) %>% # Is there any way to make this work?
is.out.same(lapply)
My function takes a function call as an argument.
Is there any way of making my function pipeable? Right now, the problem is that whatever function I call will be evaluated before the pipe. The only thing I can think of is using a function to 'unevaluate' the value, but this doesn't seem possible.
I wouldn't recommend one actually does this. The pipe operator is designed to make it easy to pass the output of one function as the input of the next. But that's not really what you're doing here at all. You want to manipulate the entire call stack. But it is technically is possible to do this. You just need to do some extra work to find the chain "meta-data" to see what was originally passed in. Here I put in two helper functions to extract the relevant info.
find_chain_parts <- function() {
i <- 1
while(!("chain_parts" %in% ls(envir=parent.frame(i))) && i < sys.nframe()) {
i <- i+1
}
parent.frame(i)
}
find_lhs <- function(x) {
env <- find_chain_parts()
if(exists("chain_parts",env)) {
return(env$chain_parts$lhs)
} else {
return(do.call("substitute", list(substitute(x), parent.frame())))
}
}
These functions walk up the call stack to find the original pipe call. If there is one present, it will extract the expression from the left hand side, if not, it will just substitute on the original parameter. You would just change your function to use
is.out.same <- function(.call, ...) {
call <- find_lhs(.call) # Captures function call
f_names <- eval(substitute(alist(...))) # Makes list of f_names
map2(rep(list(call), length(f_names)), # Creates list of new function calls
f_names,
function(.x, .y, i) {.x[[1]] <- .y; return(.x)}
) %>%
map(eval) %>% # Evaluates function calls
map_lgl(identical, x = .call) %>% # Checks output of new calls against output of original call
all() # Returns TRUE if calls produce identical outputs
}
Then both of these would run
is.out.same(map(1:3, cumsum), lapply)
# [1] TRUE
map(1:3, cumsum) %>%
is.out.same(lapply)
# [1] TRUE
But if you are really testing for functional equivalence for expressions, it would make much more sense to pass in quosures. Then you wouldn't need the different branches. Such a function would look like this
library(rlang)
is.out.same <- function(call, ...) {
f_names <- eval(substitute(alist(...))) # Makes list of f_names
map2(rep(list(call), length(f_names)), # Creates list of new function calls
f_names,
function(.x, .y, i) {.x[[2]][[1]] <- .y; return(.x)}
) %>%
map(eval_tidy) %>% # Evaluates function calls
map_lgl(identical, x = eval_tidy(call)) %>% # Checks output of new calls against output of original call
all() # Returns TRUE if calls produce identical outputs
}
and you would call it one of the following ways
is.out.same(quo(map(1:3, cumsum)), lapply)
quo(map(1:3, cumsum)) %>%
is.out.same(lapply)
This makes the intent much clearer in my opinion.
I'm running several tests for a given object x. For a given test (being a test a function that returns TRUE or FALSE when applied to an object) it is quite easy, as you can do lapply(x, test). For example:
# This would return TRUE
lapply('a', is.character)
However, I would like to create a function pass_tests, which would be able to combine multiple tests, i.e. that it could run something like this:
pass_tests('a', is.character | is.numeric)
Therefore, it should combine multiple functions given in an argument of the function, combining its result when testing an object x. In this case, it would return whether 'a' is character OR numeric, which would be TRUE. The following line should return FALSE:
pass_tests('a', is.character & is.numeric)
The idea is that it could be flexible for different combinations , e.g.:
pass_tests(x, test1 & (test2 | test3))
Any idea if functions can be logically combined this way?
Another option would be to use the pipes
library(magrittr) # or dplyr
"a" %>% {is.character(.) & is.numeric(.)}
#FALSE
"a" %>% {is.character(.) | is.numeric(.)}
#TRUE
1 %>% {is.finite(.) & (is.character(.) | is.numeric(.))}
#TRUE
Edit: used in a function with string
pass_test <- function(x, expr) {
x %>% {eval(parse(text = expr))}
}
pass_test(1, "is.finite(.) & (is.character(.) | is.numeric(.))")
#TRUE
The argument expr can be a string or an expression as in expression(is.finite(.) & (is.character(.) | is.numeric(.))).
Here's another way to do it by creating infix operators.
`%and%` <- function(lhs, rhs) {
function(...) lhs(...) & rhs(...)
}
`%or%` <- function(lhs, rhs) {
function(...) lhs(...) | rhs(...)
}
(is.character %and% is.numeric)('a')
#> [1] FALSE
(is.character %or% is.numeric)('a')
#> [1] TRUE
These can be chained together. However, it will not have the normal AND/OR precedence. It will be evaluated left-to-right.
(is.double %and% is.numeric %and% is.finite)(12)
#> [1] TRUE
I would like to be able to print the name of a dataframe passed through the pipe. Is this possible? I can do.
printname <- function(df){
print(paste(substitute(df)))
}
printname(mtcars)
#[1] "mtcars"
However, it returns "." when this function is piped using the magrittr pipe.
mtcars %>% printname
# [1] "."
This would be helpful when writing custom error messages of functions used in logged production processes -- it's hard to know where something failed if the only thing in the log is "."
It would probably be enough to return the original call, which would include the mtcars %>% piece.
This is a first attempt, it's kind of a hack, but seems like it might work.
find_chain_parts <- function() {
i <- 1
while(!("chain_parts" %in% ls(envir=parent.frame(i))) && i < sys.nframe()) {
i <- i+1
}
parent.frame(i)
}
printfirstname <- function(df){
ee <- find_chain_parts()
print(deparse(ee$lhs))
}
mtcars %>% printfirstname
# [1] "mtcars"
The pipe function creates an environment that keeps track of the chain parts. I tried walking up the current execution environments looking for this variable and then use the lhs info stored there to find the symbol at the start of the pipe. This isn't well tested.
As Tom & Lionel Henry commented on MrFlick's answer, the accepted answer no long works under more magrittr 2.
A new answer, then, eschews deparse(substitute()) for sys.calls(). I get this from Artem Sokolov's answer here. I won't pretend to fully understand what's happening but it works for me:
x_expression <- function(x) {
getAST <- function(ee) purrr::map_if(as.list(ee), is.call, getAST)
sc <- sys.calls()
ASTs <- purrr::map( as.list(sc), getAST ) %>%
purrr::keep( ~identical(.[[1]], quote(`%>%`)) ) # Match first element to %>%
if( length(ASTs) == 0 ) return( enexpr(x) ) # Not in a pipe
dplyr::last( ASTs )[[2]] # Second element is the left-hand side
}
which gives the desired output, for both pipe and non-piped notation:
x_expression(mtcars)
# mtcars
mtcars %>% x_expression()
# mtcars
I would like to be able to print the name of a dataframe passed through the pipe. Is this possible? I can do.
printname <- function(df){
print(paste(substitute(df)))
}
printname(mtcars)
#[1] "mtcars"
However, it returns "." when this function is piped using the magrittr pipe.
mtcars %>% printname
# [1] "."
This would be helpful when writing custom error messages of functions used in logged production processes -- it's hard to know where something failed if the only thing in the log is "."
It would probably be enough to return the original call, which would include the mtcars %>% piece.
This is a first attempt, it's kind of a hack, but seems like it might work.
find_chain_parts <- function() {
i <- 1
while(!("chain_parts" %in% ls(envir=parent.frame(i))) && i < sys.nframe()) {
i <- i+1
}
parent.frame(i)
}
printfirstname <- function(df){
ee <- find_chain_parts()
print(deparse(ee$lhs))
}
mtcars %>% printfirstname
# [1] "mtcars"
The pipe function creates an environment that keeps track of the chain parts. I tried walking up the current execution environments looking for this variable and then use the lhs info stored there to find the symbol at the start of the pipe. This isn't well tested.
As Tom & Lionel Henry commented on MrFlick's answer, the accepted answer no long works under more magrittr 2.
A new answer, then, eschews deparse(substitute()) for sys.calls(). I get this from Artem Sokolov's answer here. I won't pretend to fully understand what's happening but it works for me:
x_expression <- function(x) {
getAST <- function(ee) purrr::map_if(as.list(ee), is.call, getAST)
sc <- sys.calls()
ASTs <- purrr::map( as.list(sc), getAST ) %>%
purrr::keep( ~identical(.[[1]], quote(`%>%`)) ) # Match first element to %>%
if( length(ASTs) == 0 ) return( enexpr(x) ) # Not in a pipe
dplyr::last( ASTs )[[2]] # Second element is the left-hand side
}
which gives the desired output, for both pipe and non-piped notation:
x_expression(mtcars)
# mtcars
mtcars %>% x_expression()
# mtcars
I'm working with dplyr and created code to compute new data that is plotted with ggplot.
I want to create a function with this code. It should take a name of a column of the data frame that is manipulated by dplyr. However, trying to work with columnnames does not work. Please consider the minimal example below:
df <- data.frame(A = seq(-5, 5, 1), B = seq(0,10,1))
library(dplyr)
foo <- function (x) {
df %>%
filter(x < 1)
}
foo(B)
Error in filter_impl(.data, dots(...), environment()) :
object 'B' not found
Is there any solution to use the name of a column as a function argument?
If you want to create a function which accepts the string "B" as an argument (as in you question's title)
foo_string <- function (x) {
eval(substitute(df %>% filter(xx < 1),list(xx=as.name(x))))
}
foo_string("B")
If you want to create a function which accepts captures B as an argument (as in dplyr)
foo_nse <- function (x) {
# capture the argument without evaluating it
x <- substitute(x)
eval(substitute(df %>% filter(xx < 1),list(xx=x)))
}
foo_nse(B)
You can find more information in Advanced R
Edit
dplyr makes things easier in version 0.3. Functions with suffixes "_" accept a string or an expression as an argument
foo_string <- function (x) {
# construct the string
string <- paste(x,"< 1")
# use filter_ instead of filter
df %>% filter_(string)
}
foo_string("B")
foo_nse <- function (x) {
# capture the argument without evaluating it
x <- substitute(x)
# construct the expression
expression <- lazyeval::interp(quote(xx < 1), xx = x)
# use filter_ instead of filter
df %>% filter_(expression)
}
foo_nse(B)
You can find more information in this vignette
I remember a similar question which was answered by #Richard Scriven. I think you need to write something like this.
foo <- function(x,...)filter(x,...)
What #Richard Scriven mentioned was that you need to use ... here. If you type ?dplyr, you will be able to find this: filter(.data, ...) I think you replace .data with x or whatever. If you want to pick up rows which have values smaller than 1 in B in your df, it will be like this.
foo <- function (x,...) filter(x,...)
foo(df, B < 1)