I would like to be able to print the name of a dataframe passed through the pipe. Is this possible? I can do.
printname <- function(df){
print(paste(substitute(df)))
}
printname(mtcars)
#[1] "mtcars"
However, it returns "." when this function is piped using the magrittr pipe.
mtcars %>% printname
# [1] "."
This would be helpful when writing custom error messages of functions used in logged production processes -- it's hard to know where something failed if the only thing in the log is "."
It would probably be enough to return the original call, which would include the mtcars %>% piece.
This is a first attempt, it's kind of a hack, but seems like it might work.
find_chain_parts <- function() {
i <- 1
while(!("chain_parts" %in% ls(envir=parent.frame(i))) && i < sys.nframe()) {
i <- i+1
}
parent.frame(i)
}
printfirstname <- function(df){
ee <- find_chain_parts()
print(deparse(ee$lhs))
}
mtcars %>% printfirstname
# [1] "mtcars"
The pipe function creates an environment that keeps track of the chain parts. I tried walking up the current execution environments looking for this variable and then use the lhs info stored there to find the symbol at the start of the pipe. This isn't well tested.
As Tom & Lionel Henry commented on MrFlick's answer, the accepted answer no long works under more magrittr 2.
A new answer, then, eschews deparse(substitute()) for sys.calls(). I get this from Artem Sokolov's answer here. I won't pretend to fully understand what's happening but it works for me:
x_expression <- function(x) {
getAST <- function(ee) purrr::map_if(as.list(ee), is.call, getAST)
sc <- sys.calls()
ASTs <- purrr::map( as.list(sc), getAST ) %>%
purrr::keep( ~identical(.[[1]], quote(`%>%`)) ) # Match first element to %>%
if( length(ASTs) == 0 ) return( enexpr(x) ) # Not in a pipe
dplyr::last( ASTs )[[2]] # Second element is the left-hand side
}
which gives the desired output, for both pipe and non-piped notation:
x_expression(mtcars)
# mtcars
mtcars %>% x_expression()
# mtcars
Related
I'd like to use a utility function to check whether a given column exists within a given data.frame. I'm piping within the tidyverse. The best I've come up with so far is
library(magrittr)
columnExists <- function(data, col) {
tryCatch({
rlang::as_label(rlang::enquo(col)) %in% names(data)
},
error=function(e) FALSE
)
}
This works in the global environment
> mtcars %>% columnExists(mpg)
[1] TRUE
> mtcars %>% columnExists(bad)
[1] FALSE
But not when called from within another function, which is my actual use case
outerFunction <- function(d, col) {
d %>% columnExists((col))
}
> mtcars %>% outerFunction(mpg) # Expected TRUE
[1] FALSE
> mtcars %>% outerFunction(bad) # Expected FALSE
[1] FALSE
What am I doing wrong? Is it possible to have a single function that works correctly in the global environment and also when nested in another function?
I have found several SO posts related to checking for the existence of a given column or columns, but they all seem to assume either that the column name will be passed as a string or the call to check existence is not nested (or both). That is not the case here.
You want to pass though the original symbol in your outerFunction. Use
outerFunction <- function(d, col) {
d %>% columnExists( {{col}} )
}
The "embrace" syntax will prevent early evaluation.
This question is related to Passing variables to functions that use `enquo()`.
I have a higher function with arguments of a tibble (dat) and the columns of interest in dat (variables_of_interest_in_dat). Within that function, there is a call to another function to which I want to pass variables_of_interest_in_dat.
higher_function <- function(dat, variables_of_interest_in_dat){
variables_of_interest_in_dat <- enquos(variables_of_interest_in_dat)
lower_function(dat, ???variables_of_interest_in_dat???)
}
lower_function <- function(dat, variables_of_interest_in_dat){
variables_of_interest_in_dat <- enquos(variables_of_interest_in_dat)
dat %>%
select(!!!variables_of_interest_in_dat)
}
What is the recommended way to pass variables_of_interest_in_dat to lower_function?
I have tried lower_function(dat, !!!variables_of_interest_in_dat) but when I run higher_function(mtcars, cyl) this returns "Error: Can't use !!! at top level."
In the related post, the higher_function did not enquo the variables before passing them to lower function.
Thank you
Is this what you want?
library(tidyverse)
LF <- function(df,var){
newdf <- df %>% select({{var}})
return(newdf)
}
HF <- function(df,var){
LF(df,{{var}})
}
LF(mtcars,disp)
HF(mtcars,disp)
the {{}} (aka 'curly curly') operator replaces the approach of quoting with enquo()
Access result later in pipe
I am trying to create functions which print the number of rows excluded in a dataset at each step in a pipe.
Something like this:
iris %>%
function_which_save_nrows_and_return_the_data() %>%
filter(exclude some rows) %>%
function_which_prints_difference_in_rows_before_after_exlusion_and_returns_data %>%
function_which_save_nrows_and_return_the_data() %>%
function_which_prints_difference_in_rows_before_after_exlusion_and_returns_data ...etc
These are the functions I have attempted:
n_before = function(x) {assign("rows", nrow(x), .GlobalEnv); return(x)}
n_excluded = function(x) {
print(rows - nrow(x))
return(x)
}
This successfully saves the object rows:
But if I add two more links, the object is NOT saved:
So how can I create and access the rows-object later the pipe?
This is due to R's lazy evaluation. It occurs even if pipes are not used. See code below. In that code the argument to n_excluded is filter(n_before(iris), Species != 'setosa') and at the point that rows is used in the print statement the argument has not been referenced from within n_excluded so the entire argument will not have been evaluated and so rows does not yet exist.
if (exists("rows")) rm(rows) # ensure rows does not exist
n_excluded(filter(n_before(iris), Species != 'setosa'))
## Error in h(simpleError(msg, call)) :
## error in evaluating the argument 'x' in selecting a method for function
## 'print': object 'rows' not found
To fix this
1) we can force x before the print statement.
n_excluded = function(x) {
force(x)
print(rows - nrow(x))
return(x)
}
2) Alternately, we can use the magrittr sequential pipe which guarantees that legs are run in order. magrittr makes it available but does not provide an operator for it but we can assign it to an operator like this.
`%s>%` <- magrittr::pipe_eager_lexical
iris %>%
n_before() %>%
filter(Species != 'setosa') %s>% # note use of %s>% on this line
n_excluded()
The magrittr developer has stated that he will add it as an operator if there is sufficient demand for it so you might want to add such request to magrittr issue #247 on github.
You can also use the extended capabilities of pipeR.
library(dplyr)
library(pipeR)
n_excluded = function(x) {
print(rows - nrow(x))
return(x)
}
p <- iris %>>%
(~rows=nrow(.)) %>>%
filter(Species != "setosa") %>>%
n_excluded()
require(magrittr)
require(purrr)
is.out.same <- function(.call, ...) {
## Checks if args in .call will produce identical output in other functions
call <- substitute(.call) # Captures function call
f_names <- eval(substitute(alist(...))) # Makes list of f_names
map2(rep(list(call), length(f_names)), # Creates list of new function calls
f_names,
function(.x, .y, i) {.x[[1]] <- .y; return(.x)}
) %>%
map(eval) %>% # Evaluates function calls
map_lgl(identical, x = .call) %>% # Checks output of new calls against output of original call
all() # Returns TRUE if calls produce identical outputs
}
is.out.same(map(1:3, cumsum), lapply) # This works
map(1:3, cumsum) %>% # Is there any way to make this work?
is.out.same(lapply)
My function takes a function call as an argument.
Is there any way of making my function pipeable? Right now, the problem is that whatever function I call will be evaluated before the pipe. The only thing I can think of is using a function to 'unevaluate' the value, but this doesn't seem possible.
I wouldn't recommend one actually does this. The pipe operator is designed to make it easy to pass the output of one function as the input of the next. But that's not really what you're doing here at all. You want to manipulate the entire call stack. But it is technically is possible to do this. You just need to do some extra work to find the chain "meta-data" to see what was originally passed in. Here I put in two helper functions to extract the relevant info.
find_chain_parts <- function() {
i <- 1
while(!("chain_parts" %in% ls(envir=parent.frame(i))) && i < sys.nframe()) {
i <- i+1
}
parent.frame(i)
}
find_lhs <- function(x) {
env <- find_chain_parts()
if(exists("chain_parts",env)) {
return(env$chain_parts$lhs)
} else {
return(do.call("substitute", list(substitute(x), parent.frame())))
}
}
These functions walk up the call stack to find the original pipe call. If there is one present, it will extract the expression from the left hand side, if not, it will just substitute on the original parameter. You would just change your function to use
is.out.same <- function(.call, ...) {
call <- find_lhs(.call) # Captures function call
f_names <- eval(substitute(alist(...))) # Makes list of f_names
map2(rep(list(call), length(f_names)), # Creates list of new function calls
f_names,
function(.x, .y, i) {.x[[1]] <- .y; return(.x)}
) %>%
map(eval) %>% # Evaluates function calls
map_lgl(identical, x = .call) %>% # Checks output of new calls against output of original call
all() # Returns TRUE if calls produce identical outputs
}
Then both of these would run
is.out.same(map(1:3, cumsum), lapply)
# [1] TRUE
map(1:3, cumsum) %>%
is.out.same(lapply)
# [1] TRUE
But if you are really testing for functional equivalence for expressions, it would make much more sense to pass in quosures. Then you wouldn't need the different branches. Such a function would look like this
library(rlang)
is.out.same <- function(call, ...) {
f_names <- eval(substitute(alist(...))) # Makes list of f_names
map2(rep(list(call), length(f_names)), # Creates list of new function calls
f_names,
function(.x, .y, i) {.x[[2]][[1]] <- .y; return(.x)}
) %>%
map(eval_tidy) %>% # Evaluates function calls
map_lgl(identical, x = eval_tidy(call)) %>% # Checks output of new calls against output of original call
all() # Returns TRUE if calls produce identical outputs
}
and you would call it one of the following ways
is.out.same(quo(map(1:3, cumsum)), lapply)
quo(map(1:3, cumsum)) %>%
is.out.same(lapply)
This makes the intent much clearer in my opinion.
I would like to be able to print the name of a dataframe passed through the pipe. Is this possible? I can do.
printname <- function(df){
print(paste(substitute(df)))
}
printname(mtcars)
#[1] "mtcars"
However, it returns "." when this function is piped using the magrittr pipe.
mtcars %>% printname
# [1] "."
This would be helpful when writing custom error messages of functions used in logged production processes -- it's hard to know where something failed if the only thing in the log is "."
It would probably be enough to return the original call, which would include the mtcars %>% piece.
This is a first attempt, it's kind of a hack, but seems like it might work.
find_chain_parts <- function() {
i <- 1
while(!("chain_parts" %in% ls(envir=parent.frame(i))) && i < sys.nframe()) {
i <- i+1
}
parent.frame(i)
}
printfirstname <- function(df){
ee <- find_chain_parts()
print(deparse(ee$lhs))
}
mtcars %>% printfirstname
# [1] "mtcars"
The pipe function creates an environment that keeps track of the chain parts. I tried walking up the current execution environments looking for this variable and then use the lhs info stored there to find the symbol at the start of the pipe. This isn't well tested.
As Tom & Lionel Henry commented on MrFlick's answer, the accepted answer no long works under more magrittr 2.
A new answer, then, eschews deparse(substitute()) for sys.calls(). I get this from Artem Sokolov's answer here. I won't pretend to fully understand what's happening but it works for me:
x_expression <- function(x) {
getAST <- function(ee) purrr::map_if(as.list(ee), is.call, getAST)
sc <- sys.calls()
ASTs <- purrr::map( as.list(sc), getAST ) %>%
purrr::keep( ~identical(.[[1]], quote(`%>%`)) ) # Match first element to %>%
if( length(ASTs) == 0 ) return( enexpr(x) ) # Not in a pipe
dplyr::last( ASTs )[[2]] # Second element is the left-hand side
}
which gives the desired output, for both pipe and non-piped notation:
x_expression(mtcars)
# mtcars
mtcars %>% x_expression()
# mtcars