Get name of dataframe passed through pipe in R - r

I would like to be able to print the name of a dataframe passed through the pipe. Is this possible? I can do.
printname <- function(df){
print(paste(substitute(df)))
}
printname(mtcars)
#[1] "mtcars"
However, it returns "." when this function is piped using the magrittr pipe.
mtcars %>% printname
# [1] "."
This would be helpful when writing custom error messages of functions used in logged production processes -- it's hard to know where something failed if the only thing in the log is "."
It would probably be enough to return the original call, which would include the mtcars %>% piece.

This is a first attempt, it's kind of a hack, but seems like it might work.
find_chain_parts <- function() {
i <- 1
while(!("chain_parts" %in% ls(envir=parent.frame(i))) && i < sys.nframe()) {
i <- i+1
}
parent.frame(i)
}
printfirstname <- function(df){
ee <- find_chain_parts()
print(deparse(ee$lhs))
}
mtcars %>% printfirstname
# [1] "mtcars"
The pipe function creates an environment that keeps track of the chain parts. I tried walking up the current execution environments looking for this variable and then use the lhs info stored there to find the symbol at the start of the pipe. This isn't well tested.

As Tom & Lionel Henry commented on MrFlick's answer, the accepted answer no long works under more magrittr 2.
A new answer, then, eschews deparse(substitute()) for sys.calls(). I get this from Artem Sokolov's answer here. I won't pretend to fully understand what's happening but it works for me:
x_expression <- function(x) {
getAST <- function(ee) purrr::map_if(as.list(ee), is.call, getAST)
sc <- sys.calls()
ASTs <- purrr::map( as.list(sc), getAST ) %>%
purrr::keep( ~identical(.[[1]], quote(`%>%`)) ) # Match first element to %>%
if( length(ASTs) == 0 ) return( enexpr(x) ) # Not in a pipe
dplyr::last( ASTs )[[2]] # Second element is the left-hand side
}
which gives the desired output, for both pipe and non-piped notation:
x_expression(mtcars)
# mtcars
mtcars %>% x_expression()
# mtcars

Related

NSE in nested function calls

I'd like to use a utility function to check whether a given column exists within a given data.frame. I'm piping within the tidyverse. The best I've come up with so far is
library(magrittr)
columnExists <- function(data, col) {
tryCatch({
rlang::as_label(rlang::enquo(col)) %in% names(data)
},
error=function(e) FALSE
)
}
This works in the global environment
> mtcars %>% columnExists(mpg)
[1] TRUE
> mtcars %>% columnExists(bad)
[1] FALSE
But not when called from within another function, which is my actual use case
outerFunction <- function(d, col) {
d %>% columnExists((col))
}
> mtcars %>% outerFunction(mpg) # Expected TRUE
[1] FALSE
> mtcars %>% outerFunction(bad) # Expected FALSE
[1] FALSE
What am I doing wrong? Is it possible to have a single function that works correctly in the global environment and also when nested in another function?
I have found several SO posts related to checking for the existence of a given column or columns, but they all seem to assume either that the column name will be passed as a string or the call to check existence is not nested (or both). That is not the case here.
You want to pass though the original symbol in your outerFunction. Use
outerFunction <- function(d, col) {
d %>% columnExists( {{col}} )
}
The "embrace" syntax will prevent early evaluation.

Access result later in pipe

Access result later in pipe
I am trying to create functions which print the number of rows excluded in a dataset at each step in a pipe.
Something like this:
iris %>%
function_which_save_nrows_and_return_the_data() %>%
filter(exclude some rows) %>%
function_which_prints_difference_in_rows_before_after_exlusion_and_returns_data %>%
function_which_save_nrows_and_return_the_data() %>%
function_which_prints_difference_in_rows_before_after_exlusion_and_returns_data ...etc
These are the functions I have attempted:
n_before = function(x) {assign("rows", nrow(x), .GlobalEnv); return(x)}
n_excluded = function(x) {
print(rows - nrow(x))
return(x)
}
This successfully saves the object rows:
But if I add two more links, the object is NOT saved:
So how can I create and access the rows-object later the pipe?
This is due to R's lazy evaluation. It occurs even if pipes are not used. See code below. In that code the argument to n_excluded is filter(n_before(iris), Species != 'setosa') and at the point that rows is used in the print statement the argument has not been referenced from within n_excluded so the entire argument will not have been evaluated and so rows does not yet exist.
if (exists("rows")) rm(rows) # ensure rows does not exist
n_excluded(filter(n_before(iris), Species != 'setosa'))
## Error in h(simpleError(msg, call)) :
## error in evaluating the argument 'x' in selecting a method for function
## 'print': object 'rows' not found
To fix this
1) we can force x before the print statement.
n_excluded = function(x) {
force(x)
print(rows - nrow(x))
return(x)
}
2) Alternately, we can use the magrittr sequential pipe which guarantees that legs are run in order. magrittr makes it available but does not provide an operator for it but we can assign it to an operator like this.
`%s>%` <- magrittr::pipe_eager_lexical
iris %>%
n_before() %>%
filter(Species != 'setosa') %s>% # note use of %s>% on this line
n_excluded()
The magrittr developer has stated that he will add it as an operator if there is sufficient demand for it so you might want to add such request to magrittr issue #247 on github.
You can also use the extended capabilities of pipeR.
library(dplyr)
library(pipeR)
n_excluded = function(x) {
print(rows - nrow(x))
return(x)
}
p <- iris %>>%
(~rows=nrow(.)) %>>%
filter(Species != "setosa") %>>%
n_excluded()

Making a function that takes a function call as an argument into a pipeable function.

require(magrittr)
require(purrr)
is.out.same <- function(.call, ...) {
## Checks if args in .call will produce identical output in other functions
call <- substitute(.call) # Captures function call
f_names <- eval(substitute(alist(...))) # Makes list of f_names
map2(rep(list(call), length(f_names)), # Creates list of new function calls
f_names,
function(.x, .y, i) {.x[[1]] <- .y; return(.x)}
) %>%
map(eval) %>% # Evaluates function calls
map_lgl(identical, x = .call) %>% # Checks output of new calls against output of original call
all() # Returns TRUE if calls produce identical outputs
}
is.out.same(map(1:3, cumsum), lapply) # This works
map(1:3, cumsum) %>% # Is there any way to make this work?
is.out.same(lapply)
My function takes a function call as an argument.
Is there any way of making my function pipeable? Right now, the problem is that whatever function I call will be evaluated before the pipe. The only thing I can think of is using a function to 'unevaluate' the value, but this doesn't seem possible.
I wouldn't recommend one actually does this. The pipe operator is designed to make it easy to pass the output of one function as the input of the next. But that's not really what you're doing here at all. You want to manipulate the entire call stack. But it is technically is possible to do this. You just need to do some extra work to find the chain "meta-data" to see what was originally passed in. Here I put in two helper functions to extract the relevant info.
find_chain_parts <- function() {
i <- 1
while(!("chain_parts" %in% ls(envir=parent.frame(i))) && i < sys.nframe()) {
i <- i+1
}
parent.frame(i)
}
find_lhs <- function(x) {
env <- find_chain_parts()
if(exists("chain_parts",env)) {
return(env$chain_parts$lhs)
} else {
return(do.call("substitute", list(substitute(x), parent.frame())))
}
}
These functions walk up the call stack to find the original pipe call. If there is one present, it will extract the expression from the left hand side, if not, it will just substitute on the original parameter. You would just change your function to use
is.out.same <- function(.call, ...) {
call <- find_lhs(.call) # Captures function call
f_names <- eval(substitute(alist(...))) # Makes list of f_names
map2(rep(list(call), length(f_names)), # Creates list of new function calls
f_names,
function(.x, .y, i) {.x[[1]] <- .y; return(.x)}
) %>%
map(eval) %>% # Evaluates function calls
map_lgl(identical, x = .call) %>% # Checks output of new calls against output of original call
all() # Returns TRUE if calls produce identical outputs
}
Then both of these would run
is.out.same(map(1:3, cumsum), lapply)
# [1] TRUE
map(1:3, cumsum) %>%
is.out.same(lapply)
# [1] TRUE
But if you are really testing for functional equivalence for expressions, it would make much more sense to pass in quosures. Then you wouldn't need the different branches. Such a function would look like this
library(rlang)
is.out.same <- function(call, ...) {
f_names <- eval(substitute(alist(...))) # Makes list of f_names
map2(rep(list(call), length(f_names)), # Creates list of new function calls
f_names,
function(.x, .y, i) {.x[[2]][[1]] <- .y; return(.x)}
) %>%
map(eval_tidy) %>% # Evaluates function calls
map_lgl(identical, x = eval_tidy(call)) %>% # Checks output of new calls against output of original call
all() # Returns TRUE if calls produce identical outputs
}
and you would call it one of the following ways
is.out.same(quo(map(1:3, cumsum)), lapply)
quo(map(1:3, cumsum)) %>%
is.out.same(lapply)
This makes the intent much clearer in my opinion.

Capturing dots in magrittr pipe [duplicate]

I would like to be able to print the name of a dataframe passed through the pipe. Is this possible? I can do.
printname <- function(df){
print(paste(substitute(df)))
}
printname(mtcars)
#[1] "mtcars"
However, it returns "." when this function is piped using the magrittr pipe.
mtcars %>% printname
# [1] "."
This would be helpful when writing custom error messages of functions used in logged production processes -- it's hard to know where something failed if the only thing in the log is "."
It would probably be enough to return the original call, which would include the mtcars %>% piece.
This is a first attempt, it's kind of a hack, but seems like it might work.
find_chain_parts <- function() {
i <- 1
while(!("chain_parts" %in% ls(envir=parent.frame(i))) && i < sys.nframe()) {
i <- i+1
}
parent.frame(i)
}
printfirstname <- function(df){
ee <- find_chain_parts()
print(deparse(ee$lhs))
}
mtcars %>% printfirstname
# [1] "mtcars"
The pipe function creates an environment that keeps track of the chain parts. I tried walking up the current execution environments looking for this variable and then use the lhs info stored there to find the symbol at the start of the pipe. This isn't well tested.
As Tom & Lionel Henry commented on MrFlick's answer, the accepted answer no long works under more magrittr 2.
A new answer, then, eschews deparse(substitute()) for sys.calls(). I get this from Artem Sokolov's answer here. I won't pretend to fully understand what's happening but it works for me:
x_expression <- function(x) {
getAST <- function(ee) purrr::map_if(as.list(ee), is.call, getAST)
sc <- sys.calls()
ASTs <- purrr::map( as.list(sc), getAST ) %>%
purrr::keep( ~identical(.[[1]], quote(`%>%`)) ) # Match first element to %>%
if( length(ASTs) == 0 ) return( enexpr(x) ) # Not in a pipe
dplyr::last( ASTs )[[2]] # Second element is the left-hand side
}
which gives the desired output, for both pipe and non-piped notation:
x_expression(mtcars)
# mtcars
mtcars %>% x_expression()
# mtcars

Using assign to define variable in scope of function

I have the following snippet
library(magrittr)
test_fun <- function(x) {
foo <- x %>%
assign("boo", .)
boo
}
test_fun("hello") # I want this to return "hello"
# Error in test_fun("hello") : object 'boo' not found
I'd like to be able to assign values to names in the scope of the function. Is there a way to do it?
EDIT:
The reason behind the pipe here is that, in my actual use case, I'd like to save some intermediate result that can be referred to later on in the pipeline. Put another way, instead of writing e.g.
foo1 <- data %>% ...
foo2 <- foo1 %>% ...
foo3 <- some manipulation with foo1 and foo2
I can do something like
foo <- data %>% ... %>% (assign to foo2) %>% ... %>% some manipulation with foo2
I'm under the impression that this would be a "cleaner" way to code, but am happy to learn otherwise if it's not good practice.
I don't necessarily recommend this (without more understanding of all the requirements), but the issue you are having is with the scope that assign sees at the time of execution. Regardless of where it is looking, what's important is that it is in fact assigning to a variable named boo but not in the scope of test_fun nor in a way that is easily retrievable later.
A quick solution might be to do something like:
test_fun <- function(x) {
myenv <- environment()
foo <- x %>%
assign("boo", ., envir = myenv)
# something else with 'foo'? side-effect? unknown ...
boo
}
test_fun("hello")
# [1] "hello"
Since assign invisibly returns the value stored ("." in your context), you should be able to keep the pipeline moving after it.

Resources