Would like to know how to combine %>% with mapply in a proper way.
here is the toy sample
A = data.table(a = letters[1:3], b = 3:1)
mapply(function(x, y) str_c(x,"---", y), A$a, A$b)
it gives a named character vector as following
a b c
"a---3" "b---2" "c---1"
However, it generates a new variables which I try hard to avoid, and would like to make it in this form:
A %>% mapply(function(x, y) str_c(x,"---", y), .$a, .$b)
but the result is
object '.' of mode 'function' was not found
Please advise how I can make it?
To explain why your code isn’t working, you need to know that writing obj %>% f(args) always inserts obj as the first argument in the call to f, unless you use . on its own as another argument. In other words, it’s equivalent to
obj %>% f(., args)
Since you don’t use . on its own as an argument (even though you use .$a and .%b), your call is equivalent to
A %>% mapply(., function(x, y) str_c(x,"---", y), .$a, .$b)
… which doesn’t work, since mapply expects its first argument to be a function.
As Ronak’s answer shows, to circumvent this you can put f(args) into {…}:
obj %>% {f(args)}
This syntactic form explicitly disables the rule about inserting . as explained above. It’s a special case defined for this purpose.
Alternatively, you could use another pipe operator from ‘magrittr’, the exposition pipe, %$%. This one works differently: it pulls out named components from the left-hand expression. That way, you could write
A %$% mapply(function(x, y) str_c(x,"---", y), a, b)
This doesn't seem to right place to use %>%. If you need it to for learning purpose use :
A %>% {mapply(function(x, y) stringr::str_c(x,"---", y), .$a, .$b)}
Related
I tried searching for this but couldn't find any similar questions. Let's say, for the sake of a simple example, I want to do the following using dplyr's pipe %>%.
c(1,3,5) %>% ls() %>% mean()
Setting aside what the use case would be for a pipeline like this, how can I call a function "mid-pipeline" that doesn't need any inputs coming from the left-hand side and just pass them along to the next function in the pipeline? Basically, I want to put an "intermission" or "interruption" of sorts into my pipeline, let that function do its thing, and then continue on my merry way. Obviously, the above doesn't actually work, and I know the T pipe %T>% also won't be of use here because it still expects the middle function to need inputs coming from the lhs. Are there options here shy of assigning intermediate objects and restarting the pipeline?
With the ‘magrittr’ pipe operator you can put an operand inside {…} to prevent automatic argument substitution:
c(1,3,5) %>% {ls()} %>% mean()
# NA
# Warning message:
# In mean.default(.) : argument is not numeric or logical: returning NA
… but of course this serves no useful purpose.
Incidentally, ls() inside a pipeline is executed in its own environment rather than the calling environment so its use here is even less useful. But a different function that returned a sensible value could be used, e.g.:
c(1,3,5) %>% {rnorm(10)} %>% mean()
# [1] -0.01068046
Or, if you intended for the left-hand side to be passed on, skipping the intermediate ls(), you could do the following:
c(1,3,5) %>% {ls(); .} %>% mean()
# [1] 3
… again, using ls() here won’t be meaningful but some other function that has a side-effect would work.
You could define an auxilliary function like this, that takes an argument it doesn't use in order to allow it to fit in the pipe:
ls_return_x <- function(x){
print(ls())
x
}
c(1,3,5) %>% ls() %>% mean()
Note, the ls() call in this example will print the objects in the environment within the ls_return_x() function. Check out the help page for ls() if you want to print the environment from the global environment.
I don't know if there is an inbuilt function but you could certainly create a helper function for this
> callfun <- function(x, fun){fun(); return(x)}
> c(1, 3, 5) %>% callfun(fun = ls) %>% mean()
# [1] 3
I don't really see the point but hey - it's your life.
I have been using R for a long time and am very happy using the map-family of functions as well as rowwise. I really just don't get the apply-family, even after reading many a tutorial. Right now it's very much up to chance if I get any apply function to work, and if I do, I'm not sure why it did in that case. Could anyone give an intuitive explanation of the syntax? E.g. why does the code below fail?
stupid_function = function(x,y){
a = sum(x,y)
b = max(x,y)
return(list(MySum=a,MyMax=b))
}
mtcars %>%
rowwise() %>%
mutate(using_rowwise = list(stupid_function(vs, am))) %>%
unnest_wider(using_rowwise)
mtcars %>%
mutate(using_map = pmap(list(vs,am),stupid_function)) %>%
unnest_wider(using_map)
mtcars %>%
mutate(using_lapply = lapply(list(vs,am), stupid_function))
Using rowwise and pmap I get what I want/expect. But the last line yields the following error:
Error: Problem with `mutate()` input `using_lapply`.
x argument "y" is missing, with no default
i Input `using_lapply` is `lapply(list(vs, am), stupid_function)`.
Run `rlang::last_error()` to see where the error occurred.
The lapply() function has the following usage (from ?lapply).
lapply(X, FUN, ...)
The X argument is a list or vector or data.frame - something with elements. The FUN argument is some function. lapply then applies the FUN to each element of X and returns the outputs in a list. The first element of this list is FUN(X[1])and the second is FUN(X[2]).
In your example, lapply(list(vs,am), stupid_function), lapply is trying to apply stupid_function to vs and then to am. However, stupid_function appears to require two arguments. This is where the ... comes in. You pass additional arguments to FUN here. You just need to name them correctly. So, in your case, you would use lapply(vs, stupid_function, y = am).
However, this isn't really what you want either. This will use all am as the second argument and not iterate over am. lapply only iterates over one variable, not two. You want to use a map function for this or you need to do something like the following:
lapply(1:nrow(mtcars) function(x) {stupid_function(mtcars$vs[x], mtcars$am[x]})
usage of pipe operator in purrr-dplyr packages is (in short) defined as follows:
y%>%f(x,.,z) is the same as f(x,y,z)
I am trying to do the following task using pipe operator. First I show you the task without using pipes:
#####for reproducibility
set.seed(50)
z0<-factor(sample(c(letters[1:3],NA),100,replace = T))
###the task
rep(1,length(table(z0)))
now I want to do this using pipes:
z0%>%table%>%rep(1,length(.))
however the result is not the same. It seems that pipe operator cannot handle the proper assignation to a composition of functions. That is
y%>%f(x,g(.)) should be the same as f(x,g(y))
so, the concrete question is if ti is possible to do
y%>%f(x,g(.))
Thank you in advance for your comments.
The %>% implements a first argument rule, that is, it passes the previous data as first argument to the function if . is not a direct argument; In your second case, the argument to rep is 1 and length(.), so the first argument rule takes effect; To avoid this, use {} to enclose the expression; You can read more about this at Re-using the placeholder for attributes:
Re-using the placeholder for attributes
It is straight-forward to use the placeholder several times in a
right-hand side expression. However, when the placeholder only appears
in a nested expressions magrittr will still apply the first-argument
rule. The reason is that in most cases this results more clean code.
x %>% f(y = nrow(.), z = ncol(.)) is equivalent to f(x, y = nrow(x), z = ncol(x))
The behavior can be overruled by enclosing the right-hand side in
braces:
x %>% {f(y = nrow(.), z = ncol(.))} is equivalent to f(y = nrow(x), z = ncol(x))
rep(1,length(table(z0)))
# [1] 1 1 1
Equivalent would be:
z0 %>% table %>% {rep(1,length(.))}
# [1] 1 1 1
Data for reproducibility
.i <- tibble(a=2*1:4+1, b=2*1:4)
This function is supposed to take its data and other arguments as unquoted names, find those names in the data, and use them to add a column and filter out the
top row. It does not work. Mutate says it can not find a.
t1 <- function(.j=.i, X=a, Y=b){
e_X <- enquo(X)
e_Y <- enquo(Y)
mutate(.data=.j, pass=UQ(e_X)+1) %>%
filter(UQ(e_Y) > 3) -> out
out
}
t1(a,b)
This function, which I found by typo -- note the .i instead of .j in the mutate statement -- does what the previous function was supposed to do. And I don't know why. I think it is skipping over the function arguments and finding .i in the global environment. Or maybe it is using a ouiji board.
t2 <- function(.j=.i, X=a, Y=b){
e_X <- enquo(X)
e_Y <- enquo(Y)
mutate(.data=.i, pass=UQ(e_X)+1) %>%
filter(UQ(e_Y) > 3) -> out
out
}
t1(a,b)
Since mutate could not find .j when passed to it in the usual R way, maybe it needs to be passed in an rlang-style quosure, like the formals X and Y. This function also does not work, with UQ in mutate saying that it can not find a. Like the first function above, it works if the .j in mutate is replaced with a .i. (Seems like there should be an "enquos" to parallel quos).
t3 <- function(.j=.i, X=a, Y=b){
e_j <- enquo(.j)
e_X <- enquo(X)
e_Y <- enquo(Y)
mutate(.data=UQ(.j), pass=UQ(e_X)+1) %>%
filter(UQ(e_Y) > 3) -> out
out
}
t1(a,b)
Finally, it appears that, once the .i substitution in mutate is made, t4() no longer needs a data argument at all. See below, where I replace it with bop_foo_foo. If, however, you replace bop_foo_foo throughout with the name of the data, .i, (t5()) then UQ again fails to find a.
bop_foo_foo <- 0
t4 <- function(bop_foo_foo, X=a, Y=b){
e_j <- enquo(bop_foo_foo)
e_X <- enquo(X)
e_Y <- enquo(Y)
mutate(.data=UQ(.i), pass=UQ(e_X)+1) %>%
filter(UQ(e_Y) > 3) -> out
out
}
t1(a,b)
The functions above seem to me to be relatively minor variants on a single function. I have run dozens more, and although I have observed some patterns,
and read the enquo and UQ help files I do not know how many times, a real
understanding continues to elude me.
I would like to know why the functions above that that don't work don't, and why the ones that do work do. I don't necessarily need a function by function critique. If you can state general principles that embody the required, understanding, that would be delightful. And more than sufficient.
I think it is skipping over the function arguments and finding .i in the global environment.
Yes, scope of symbols in R is hierarchical. The variables local to a function are looked up first, and then the surrounding environment of the function is inspected, and so on.
mutate(.data = UQ(.j), ...)
I think you are missing the difference between regular arguments and (quasi)quoted arguments. Unquoting is only relevant for quasiquoted arguments. Since the .data argument of mutate() is not quasiquoted it does not make sense to try and unquote stuff. The quasiquoted arguments are the ones that are captured/quoted with enexpr() or enquo(). You can tell whether an argument is quasiquoted either by looking at the documentation or by recognising that the argument supports direct references to columns (regular arguments need to be explicit about where to find the columns).
In the next version of rlang, the exported UQ() function will throw an error to make it clear that it should not be called directly and that it can only be used in quasiquoted arguments.
I would suggest:
Call the first argument of your function data or df rather than .i.
Don't give it a default. The user should always supply the data.
Don't capture it with enquo() or enexpr() or substitute(). Instead pass it directly to the data argument of other verbs.
Once this is out of the way it will be easier to work out the rest.
Can somebody explain to me why the two following instructions have different outputs:
library(plyr)
library(dplyr)
ll <- list(a = mtcars, b = mtcars)
# using '.' as a function parameter
llply(ll, function(.) . %>% group_by(cyl) %>% summarise(min = min(mpg)))
# using 'd' as function parameter
llply(ll, function(d) d %>% group_by(cyl) %>% summarise(min = min(mpg)))
The former case is apparently not even evaluated (which I figured by misspelling summarise: llply(ll, function(.) . %>% group_by(cyl) %>% sumamrise(min = min(mpg))) would not throw an error).
So this has all to do with scoping rules and where things are evaluated, but I really want to understand what is going on, and why this happens? I use . as an argument in anonymous functions quite often and I was puzzled to see the outcome.
So long story short, why does . not work with %>%?
This seems to be because of the special use of . as a placeholder when using piping. From ?"%>%":
Using the dot for secondary purposes
Often, some attribute or property
of lhs is desired in the rhs call in addition to the value of lhs
itself, e.g. the number of rows or columns. It is perfectly valid to
use the dot placeholder several times in the rhs call, but by design
the behavior is slightly different when using it inside nested
function calls. In particular, if the placeholder is only used in a
nested function call, lhs will also be placed as the first argument!
The reason for this is that in most use-cases this produces the most
readable code. For example, iris %>% subset(1:nrow(.) %% 2 == 0) is
equivalent to iris %>% subset(., 1:nrow(.) %% 2 == 0) but slightly
more compact. It is possible to overrule this behavior by enclosing
the rhs in braces. For example, 1:10 %>% {c(min(.), max(.))} is
equivalent to c(min(1:10), max(1:10)).
The . ("the dot") has multiple uses, one of which is indeed as an argument. How it's actually interpreted is highly dependent on its context -- and in your context, it's used immediately before a %>% forward-pipe operator. dplyr takes its forward-pipe operator from magrittr, and from the magrittr documentation we have the following snippet on what happens when there's a . %>% somefunction():
When the dot is used as lhs, the result will be a functional sequence, i.e. a function which applies the entire chain of right-hand sides in turn to its input.
So it's almost like an order of operations thing - a %>% immediately after the dot would interpret the dot as a part of the functional sequence.
One way to get your . understood as an argument instead is to add brackets around it, i.e.
llply(ll, function(.) (.) %>% group_by(cyl) %>% summarise(min = min(mpg)))
For a more thorough explanation of the different uses of . and %>%, and their interaction with each other, have a look at https://cran.r-project.org/web/packages/magrittr/magrittr.pdf. The relevant section starts from page 8.