Access result later in pipe
I am trying to create functions which print the number of rows excluded in a dataset at each step in a pipe.
Something like this:
iris %>%
function_which_save_nrows_and_return_the_data() %>%
filter(exclude some rows) %>%
function_which_prints_difference_in_rows_before_after_exlusion_and_returns_data %>%
function_which_save_nrows_and_return_the_data() %>%
function_which_prints_difference_in_rows_before_after_exlusion_and_returns_data ...etc
These are the functions I have attempted:
n_before = function(x) {assign("rows", nrow(x), .GlobalEnv); return(x)}
n_excluded = function(x) {
print(rows - nrow(x))
return(x)
}
This successfully saves the object rows:
But if I add two more links, the object is NOT saved:
So how can I create and access the rows-object later the pipe?
This is due to R's lazy evaluation. It occurs even if pipes are not used. See code below. In that code the argument to n_excluded is filter(n_before(iris), Species != 'setosa') and at the point that rows is used in the print statement the argument has not been referenced from within n_excluded so the entire argument will not have been evaluated and so rows does not yet exist.
if (exists("rows")) rm(rows) # ensure rows does not exist
n_excluded(filter(n_before(iris), Species != 'setosa'))
## Error in h(simpleError(msg, call)) :
## error in evaluating the argument 'x' in selecting a method for function
## 'print': object 'rows' not found
To fix this
1) we can force x before the print statement.
n_excluded = function(x) {
force(x)
print(rows - nrow(x))
return(x)
}
2) Alternately, we can use the magrittr sequential pipe which guarantees that legs are run in order. magrittr makes it available but does not provide an operator for it but we can assign it to an operator like this.
`%s>%` <- magrittr::pipe_eager_lexical
iris %>%
n_before() %>%
filter(Species != 'setosa') %s>% # note use of %s>% on this line
n_excluded()
The magrittr developer has stated that he will add it as an operator if there is sufficient demand for it so you might want to add such request to magrittr issue #247 on github.
You can also use the extended capabilities of pipeR.
library(dplyr)
library(pipeR)
n_excluded = function(x) {
print(rows - nrow(x))
return(x)
}
p <- iris %>>%
(~rows=nrow(.)) %>>%
filter(Species != "setosa") %>>%
n_excluded()
Related
I'm trying to do anonymous recursion in R and also playing with pipes to learn. The code below works well
sorttt <- function(list){
if (length(list) == 0) c() else c(max(list), Recall(list[list < max(list)]))
}
example %>% sorttt
But this code errors out with the error: Error in example %>% function(list) { : invalid formal argument list for "function"
example %>% function(list){if (length(list) == 0) c() else c(max(list), Recall(list[list < max(list)]))}
Does anyone know why these two might act differently? These seem to be logically the same thing.
You need to wrap anonymous functions in parentheses for them to work with pipes.
## doesn't work
1:10 %>% function(x) {mean(x)}
# Error in 1:10 %>% function(x) { :
# invalid formal argument list for "function"
## works with parens
1:10 %>% (function(x) {mean(x)})
# [1] 5.5
Same thing for your function:
1:3 %>%
(function(list){if (length(list) == 0) c() else c(max(list), Recall(list[list < max(list)]))})
# [1] 3 2 1
This is because function is itself a function. A %>% function(x){...} is interpreted as function(A, x){...}. The parentheses make sure the whole function definition is run before the pipe inserts an argument.
In my dataset, I have a few possible grouping variables a, b, c. How do I programmatically tell dplyr to not group by any variables?
For example:
granularity <- NA
if(isTRUE(granularity == 'all')){
# all group variables
group_variables <- quos(a, b, c)
}else if(isTRUE(granularity == 'no_c')){
# all except c
group_variables <- quos(a, b)
}else{
# no group variables
group_variables <- quo()
}
data_summary <- mydata %>%
group_by(!!! group_variables) %>%
summarise(
x_mean = mean(x)
)
This will run correctly if I set granularity to 'all' or 'no_c', but it fails when I assign group_variables to the empty quosure. Does anyone know how to make this work?
Edit: This question also applies to functions like select, so assume I wanted to run
data_select <- mydata %>%
select(!!! select_variables, d, e, f)
How do I set select_variables to sometimes be quos(a, b, c) or sometimes be empty?
Thanks!
Use group_variables <- NULL in that clause:
}else{
# no group variables
group_variables <- NULL
}
also note the massive warning:
Error in grouped_df_impl(data, unname(vars), drop) :
Column `<empty>` is unknown
In addition: Warning message:
Unquoting language objects with `!!!` is soft-deprecated as of rlang 0.3.0.
Please use `!!` instead.
# Bad:
dplyr::select(data, !!!enquo(x))
# Good:
dplyr::select(data, !!enquo(x)) # Unquote single quosure
dplyr::select(data, !!!enquos(x)) # Splice list of quosures
You might want to consider not using packages with unstable APIs.
I just would like to understand what's going wrong here.
In the first case (working), I assign the enquo()-ted argument to a variable, in the second case, I use the enquoted argument directly in my call to mutate.
library("dplyr")
df <- tibble(x = 1:5, y= 1:5, z = 1:5)
# works
myfun <- function(df, transformation) {
my_transformation <- rlang::enquo(transformation)
df %>%
gather("key","value", x,y,z) %>%
mutate(value = UQ(my_transformation))
}
myfun(df,exp(value))
# does not work
myfun_2 <- function(df, transformation) {
df %>%
gather("key","value", x,y,z) %>%
mutate(value = UQ(rlang::enquo(transformation)))
}
myfun_2(df,exp(value))
#>Error in mutate_impl(.data, dots) : Column `value` is of unsupported type closure
Edit
Here are some more lines to think about :)
Wrapping the call into quo() it looks as if the expression to evaluate is "built" correctly
# looks as if the whole thing should be working
myfun_2_1 <- function(df, transformation) {
quo(df %>%
gather("key","value", x,y,z) %>%
mutate(value = UQ(rlang::enquo(transformation))))
}
myfun_2_1(df,exp(value))
If you tell this to eval_tidy, it works (it doesn't work without quo())
# works
myfun_2_2 <- function(df, transformation) {
eval_tidy(quo(df %>%
gather("key","value", x,y,z) %>%
mutate(value = UQ(rlang::enquo(transformation)))))
}
myfun_2_2(df,exp(value))
If you don't use the pipe, it also works
# works
myfun_2_3 <- function(df, transformation) {
mutate(gather(df,"key","value", x,y,z), value = UQ(rlang::enquo(transformation)))
}
myfun_2_3(df,exp(value))
Regarding the error message, this is what one gets, when one tries to pass types that are not supported by data.frames, eg.
mutate(df, value = function(x) x)
# Error in mutate_impl(.data, dots) : Column value is of unsupported type closure
To me it looks as if the quosure in myfun_2 isn't evaluated by mutate, which is somehow interesting/non-intuitive behaviour. Do you think I should report this to the developers?
This limitation is solved in rlang 0.2.0.
Technically: The core of the issue was that magrittr evaluates its arguments in a child of the current environment. This is this environment that contains the . pronoun. As of 0.2.0, capture of arguments with enquo() and variants is now lexically scoped, which means it looks up the stack of parent environments to find the argument to capture. This solves the magrittr problem.
I would like to be able to print the name of a dataframe passed through the pipe. Is this possible? I can do.
printname <- function(df){
print(paste(substitute(df)))
}
printname(mtcars)
#[1] "mtcars"
However, it returns "." when this function is piped using the magrittr pipe.
mtcars %>% printname
# [1] "."
This would be helpful when writing custom error messages of functions used in logged production processes -- it's hard to know where something failed if the only thing in the log is "."
It would probably be enough to return the original call, which would include the mtcars %>% piece.
This is a first attempt, it's kind of a hack, but seems like it might work.
find_chain_parts <- function() {
i <- 1
while(!("chain_parts" %in% ls(envir=parent.frame(i))) && i < sys.nframe()) {
i <- i+1
}
parent.frame(i)
}
printfirstname <- function(df){
ee <- find_chain_parts()
print(deparse(ee$lhs))
}
mtcars %>% printfirstname
# [1] "mtcars"
The pipe function creates an environment that keeps track of the chain parts. I tried walking up the current execution environments looking for this variable and then use the lhs info stored there to find the symbol at the start of the pipe. This isn't well tested.
As Tom & Lionel Henry commented on MrFlick's answer, the accepted answer no long works under more magrittr 2.
A new answer, then, eschews deparse(substitute()) for sys.calls(). I get this from Artem Sokolov's answer here. I won't pretend to fully understand what's happening but it works for me:
x_expression <- function(x) {
getAST <- function(ee) purrr::map_if(as.list(ee), is.call, getAST)
sc <- sys.calls()
ASTs <- purrr::map( as.list(sc), getAST ) %>%
purrr::keep( ~identical(.[[1]], quote(`%>%`)) ) # Match first element to %>%
if( length(ASTs) == 0 ) return( enexpr(x) ) # Not in a pipe
dplyr::last( ASTs )[[2]] # Second element is the left-hand side
}
which gives the desired output, for both pipe and non-piped notation:
x_expression(mtcars)
# mtcars
mtcars %>% x_expression()
# mtcars
I would like to be able to print the name of a dataframe passed through the pipe. Is this possible? I can do.
printname <- function(df){
print(paste(substitute(df)))
}
printname(mtcars)
#[1] "mtcars"
However, it returns "." when this function is piped using the magrittr pipe.
mtcars %>% printname
# [1] "."
This would be helpful when writing custom error messages of functions used in logged production processes -- it's hard to know where something failed if the only thing in the log is "."
It would probably be enough to return the original call, which would include the mtcars %>% piece.
This is a first attempt, it's kind of a hack, but seems like it might work.
find_chain_parts <- function() {
i <- 1
while(!("chain_parts" %in% ls(envir=parent.frame(i))) && i < sys.nframe()) {
i <- i+1
}
parent.frame(i)
}
printfirstname <- function(df){
ee <- find_chain_parts()
print(deparse(ee$lhs))
}
mtcars %>% printfirstname
# [1] "mtcars"
The pipe function creates an environment that keeps track of the chain parts. I tried walking up the current execution environments looking for this variable and then use the lhs info stored there to find the symbol at the start of the pipe. This isn't well tested.
As Tom & Lionel Henry commented on MrFlick's answer, the accepted answer no long works under more magrittr 2.
A new answer, then, eschews deparse(substitute()) for sys.calls(). I get this from Artem Sokolov's answer here. I won't pretend to fully understand what's happening but it works for me:
x_expression <- function(x) {
getAST <- function(ee) purrr::map_if(as.list(ee), is.call, getAST)
sc <- sys.calls()
ASTs <- purrr::map( as.list(sc), getAST ) %>%
purrr::keep( ~identical(.[[1]], quote(`%>%`)) ) # Match first element to %>%
if( length(ASTs) == 0 ) return( enexpr(x) ) # Not in a pipe
dplyr::last( ASTs )[[2]] # Second element is the left-hand side
}
which gives the desired output, for both pipe and non-piped notation:
x_expression(mtcars)
# mtcars
mtcars %>% x_expression()
# mtcars