piping with dot inside dplyr::filter - r

I'm struggling to pipe stuff to another argument inside the function filter from dplyr using %>% margritr.
I would assume that this should work:
library(dplyr)
library(margritr)
d <- data.frame(a=c(1,2,3),b=c(4,5,6))
c(2,2) %>% filter(d, a %in% .)
But I get this:
# Error in UseMethod("filter_") :
# no applicable method for 'filter_' applied to an object of class "c('double', 'numeric')"
I would expect it to work in the same way as this:
filter(d, a %in% c(2,2))
# a b
# 1 2 5
What am I doing wrong?

The pipe is designed to compose the function around its first argument when you pass it. When you want to circumvent this behavior, you can generate an anonymous environment that is more flexible. You do this with curly braces, just like when you're writing a function.
5 %>%
{filter(iris, Sepal.Length == .)}
For why this works, writing {somefunctions(x, y)} is equivalent to writing function(...) {somefunctions(x, y)}. So the function above ignores its arguments, but just evaluates the variables in its environment. The . pronoun is defined for it by the pipe, and it searches for other variables (like iris) in the global environment.

By default it will pipe to the first argument. The only way around it is to name the first arg explicitly:
c(2,2) %>%
filter(.data = d, a %in% .)
but looks like this doesn't work very well:
a b
1 2 5
Warning message:
In (~.) & (~a %in% .) :
longer object length is not a multiple of shorter object length
P.S. you don't need to load magrittr explicitly as %>% is already in dplyr

Related

How can I pass a value to the `filter` function with a pipe without the curly brackets?

How can I pass a value to the filter function with a pipe without using curly brackets?
library(dplyr)
4 %>% {filter(mtcars, cyl == .)} # Works
4 %>% filter(mtcars, cyl == .) # Does not work
Error in UseMethod("filter") :
no applicable method for 'filter' applied to an object of class "c('double', 'numeric')"
You can’t.
The semantics of the ‘magrittr’ pipe are such that the LHS is inserted as the first argument of the RHS unless . is used at the top level of an argument. That is,
x %>% f(...)
is always equivalent to
f(x, ...)
unless ... contains . at the top level.
To suppress this behaviour, %>% specifically provides the {…} syntax which you’re already aware of.

How do you call a function that takes no inputs within a pipeline?

I tried searching for this but couldn't find any similar questions. Let's say, for the sake of a simple example, I want to do the following using dplyr's pipe %>%.
c(1,3,5) %>% ls() %>% mean()
Setting aside what the use case would be for a pipeline like this, how can I call a function "mid-pipeline" that doesn't need any inputs coming from the left-hand side and just pass them along to the next function in the pipeline? Basically, I want to put an "intermission" or "interruption" of sorts into my pipeline, let that function do its thing, and then continue on my merry way. Obviously, the above doesn't actually work, and I know the T pipe %T>% also won't be of use here because it still expects the middle function to need inputs coming from the lhs. Are there options here shy of assigning intermediate objects and restarting the pipeline?
With the ‘magrittr’ pipe operator you can put an operand inside {…} to prevent automatic argument substitution:
c(1,3,5) %>% {ls()} %>% mean()
# NA
# Warning message:
# In mean.default(.) : argument is not numeric or logical: returning NA
… but of course this serves no useful purpose.
Incidentally, ls() inside a pipeline is executed in its own environment rather than the calling environment so its use here is even less useful. But a different function that returned a sensible value could be used, e.g.:
c(1,3,5) %>% {rnorm(10)} %>% mean()
# [1] -0.01068046
Or, if you intended for the left-hand side to be passed on, skipping the intermediate ls(), you could do the following:
c(1,3,5) %>% {ls(); .} %>% mean()
# [1] 3
… again, using ls() here won’t be meaningful but some other function that has a side-effect would work.
You could define an auxilliary function like this, that takes an argument it doesn't use in order to allow it to fit in the pipe:
ls_return_x <- function(x){
print(ls())
x
}
c(1,3,5) %>% ls() %>% mean()
Note, the ls() call in this example will print the objects in the environment within the ls_return_x() function. Check out the help page for ls() if you want to print the environment from the global environment.
I don't know if there is an inbuilt function but you could certainly create a helper function for this
> callfun <- function(x, fun){fun(); return(x)}
> c(1, 3, 5) %>% callfun(fun = ls) %>% mean()
# [1] 3
I don't really see the point but hey - it's your life.

Why doesn't purrr::transpose work with the pipe operator?

I don't understand why purrr::transpose issues an error after the pipe operator.
This isn"t the case with other functions like purrr::map.
See example below:
library(purrr)
# Works
identical(mtcars %>% map(~{.x}), mtcars %>% purrr::map(~{.x}))
# [1] TRUE
# Works
mtcars %>% transpose
# Doesn't work
mtcars %>% purrr::transpose
#Error in .::purrr : unused argument (transpose)
It is possible I have misunderstood the namespace operator please correct me if so however this is what I believe the reason to be.
I believe this is an issue where the namespace notation actually acts as an infix operator ::. This means that the function call it is trying to use is:
`::`(mtcars, purrr, transpose)
The error here occurs as the namespace infix operator can only accept two arguments: the package name and the function from the package.
This isn't expected by the user as we would want to be able to use functions from external namespaces with the pipe operator. This is because the code is confused as to what the function attempting to be called is and so it finds the first function it can (in this case ::).
The solution to this is to use brackets to note that transpose is the function or that purrr::transpose should be evaluated first. We can do this with the following code:
# purrr::transpose is the function
mtcars %>% purrr::transpose()
# Evaluate this block as the expression of the function
mtcars %>% (purrr::transpose)

Passing (function) user-specified column name to dplyr do()

Original question
Can anyone explain to me why unquote does not work in the following?
I want to pass on a (function) user-specified column name in a call to do in version 0.7.4 of dplyr. This does seem somewhat less awkward than the older standard evaluation approach using do_. A basic (successful) example ignoring the fact that using do here is very unnecessary would be something like:
sum_with_do <- function(D, x, ...) {
x <- rlang::ensym(x)
gr <- quos(...)
D %>%
group_by(!!! gr) %>%
do(data.frame(y=sum(.[[quo_name(x)]])))
}
D <- data.frame(group=c('A','A','B'), response=c(1,2,3))
sum_with_do(D, response, group)
# A tibble: 2 x 2
# Groups: group [2]
group y
<fct> <dbl>
1 A 3.
2 B 3.
The rlang:: is unnecessary as of dplyr 0.7.5 which now exports ensym. I have included lionel's suggestion regarding using ensym here rather than enquo, as the former guarantees that the value of x is a symbol (not an expression).
Unquoting not useful here (e.g. other dplyr examples), replacing quo_name(x) with !! x in the above produces the following error:
Error in ~response : object 'response' not found
Explanation
As per the accepted response, the underlying reason is that do does not evaluate the expression in the same environment that other dplyr functions (e.g. mutate) use.
I did not find this to be abundantly clear from either the documentation or the source code (e.g. compare the source for mutate and do for data.frames and follow Alice down the rabbit hole if you wish), but essentially - and this is probably nothing new to most;
do evaluates expressions in an environment whose parent is the calling environment, and attaches the current group (slice) of the data.frame to the symbol ., and;
other dplyr functions 'more or less' evaluate the expressions in the environment of the data.frame with parent being the calling environment.
See also Advanced R. 22. Evaluation for a description in terms of 'data masking'.
This is because of regular do() semantics where there is no data masking apart from .:
do(df, data.frame(y = sum(.$response)))
#> y
#> 1 6
do(df, data.frame(y = sum(.[[response]])))
#> Error: object 'response' not found
So you just need to capture the bare column name as a string and there is no need to unquote since there is no data masking:
sum_with_do <- function(df, x, ...) {
# ensym() guarantees that `x` is a simple column name and not a
# complex expression:
x <- as.character(ensym(x))
df %>%
group_by(...) %>%
do(data.frame(y = sum(.[[x]])))
}

cannot combine with and functions in R

Could somebody please point out to me why is that the following example does not work:
df <- data.frame(ex =rep(1,5))
sample.fn <- function(var1) with(df, mean(var1))
sample.fn(ex)
It seems that I am using the wrong syntax to combine with inside of a function.
Thanks,
This is what I meant by learning to use "[" (actually"[["):
> df <- data.frame(ex =rep(1,5))
> sample.fn <- function(var1) mean(df[[var1]])
> sample.fn('ex')
[1] 1
You cannot use an unquoted ex since there is no object named 'ex', at least not at the global environment where you are making the call to sample.fn. 'ex' is a name only inside the environment of the df-dataframe and only df itself is "visible" when the sample.fn-function is called.
Out of interest, I tried using the method that the with.default function uses to build a function taking an unquoted expression argument in the manner you were expecting:
samp.fn <- function(expr) mean(
eval(substitute(expr), df, enclos = parent.frame())
)
samp.fn(ex)
#[1] 1
It's not a very useful function, since it would only be applicable when there was a dataframe named 'df' in the parent.frame(). And apologies for incorrectly claiming that there was a warning on the help page. As #rawr points out the warning about using functions that depend on non-standard evaluation appears on the subset page.

Resources