When you review or debug codes, you come up with codes with nested function calls like f(g(h(2),2,3),4).
We can use magrittr's pipe(%>%) or R base pipe (|>) manually. But what if I want to do it automatically because humans err.
I know of pipefittr but it is not perfect as you can see below.
pipefittr("c(3,2,4,c(3,2,4))")
> 3 %>% 3,2,4,c(2) %>% c()
If the arguments are all unique, then it might need something to start with.
pipefittr("c(1,2,3,c(4,5,6))", start="4")
> 4 %>% c(.,5,6) %>% c(1,2,3,.)
and just omitting . in the first argument would do.
So how can I do this?
Related
I tried searching for this but couldn't find any similar questions. Let's say, for the sake of a simple example, I want to do the following using dplyr's pipe %>%.
c(1,3,5) %>% ls() %>% mean()
Setting aside what the use case would be for a pipeline like this, how can I call a function "mid-pipeline" that doesn't need any inputs coming from the left-hand side and just pass them along to the next function in the pipeline? Basically, I want to put an "intermission" or "interruption" of sorts into my pipeline, let that function do its thing, and then continue on my merry way. Obviously, the above doesn't actually work, and I know the T pipe %T>% also won't be of use here because it still expects the middle function to need inputs coming from the lhs. Are there options here shy of assigning intermediate objects and restarting the pipeline?
With the ‘magrittr’ pipe operator you can put an operand inside {…} to prevent automatic argument substitution:
c(1,3,5) %>% {ls()} %>% mean()
# NA
# Warning message:
# In mean.default(.) : argument is not numeric or logical: returning NA
… but of course this serves no useful purpose.
Incidentally, ls() inside a pipeline is executed in its own environment rather than the calling environment so its use here is even less useful. But a different function that returned a sensible value could be used, e.g.:
c(1,3,5) %>% {rnorm(10)} %>% mean()
# [1] -0.01068046
Or, if you intended for the left-hand side to be passed on, skipping the intermediate ls(), you could do the following:
c(1,3,5) %>% {ls(); .} %>% mean()
# [1] 3
… again, using ls() here won’t be meaningful but some other function that has a side-effect would work.
You could define an auxilliary function like this, that takes an argument it doesn't use in order to allow it to fit in the pipe:
ls_return_x <- function(x){
print(ls())
x
}
c(1,3,5) %>% ls() %>% mean()
Note, the ls() call in this example will print the objects in the environment within the ls_return_x() function. Check out the help page for ls() if you want to print the environment from the global environment.
I don't know if there is an inbuilt function but you could certainly create a helper function for this
> callfun <- function(x, fun){fun(); return(x)}
> c(1, 3, 5) %>% callfun(fun = ls) %>% mean()
# [1] 3
I don't really see the point but hey - it's your life.
I have been using R for a long time and am very happy using the map-family of functions as well as rowwise. I really just don't get the apply-family, even after reading many a tutorial. Right now it's very much up to chance if I get any apply function to work, and if I do, I'm not sure why it did in that case. Could anyone give an intuitive explanation of the syntax? E.g. why does the code below fail?
stupid_function = function(x,y){
a = sum(x,y)
b = max(x,y)
return(list(MySum=a,MyMax=b))
}
mtcars %>%
rowwise() %>%
mutate(using_rowwise = list(stupid_function(vs, am))) %>%
unnest_wider(using_rowwise)
mtcars %>%
mutate(using_map = pmap(list(vs,am),stupid_function)) %>%
unnest_wider(using_map)
mtcars %>%
mutate(using_lapply = lapply(list(vs,am), stupid_function))
Using rowwise and pmap I get what I want/expect. But the last line yields the following error:
Error: Problem with `mutate()` input `using_lapply`.
x argument "y" is missing, with no default
i Input `using_lapply` is `lapply(list(vs, am), stupid_function)`.
Run `rlang::last_error()` to see where the error occurred.
The lapply() function has the following usage (from ?lapply).
lapply(X, FUN, ...)
The X argument is a list or vector or data.frame - something with elements. The FUN argument is some function. lapply then applies the FUN to each element of X and returns the outputs in a list. The first element of this list is FUN(X[1])and the second is FUN(X[2]).
In your example, lapply(list(vs,am), stupid_function), lapply is trying to apply stupid_function to vs and then to am. However, stupid_function appears to require two arguments. This is where the ... comes in. You pass additional arguments to FUN here. You just need to name them correctly. So, in your case, you would use lapply(vs, stupid_function, y = am).
However, this isn't really what you want either. This will use all am as the second argument and not iterate over am. lapply only iterates over one variable, not two. You want to use a map function for this or you need to do something like the following:
lapply(1:nrow(mtcars) function(x) {stupid_function(mtcars$vs[x], mtcars$am[x]})
Can somebody explain to me why the two following instructions have different outputs:
library(plyr)
library(dplyr)
ll <- list(a = mtcars, b = mtcars)
# using '.' as a function parameter
llply(ll, function(.) . %>% group_by(cyl) %>% summarise(min = min(mpg)))
# using 'd' as function parameter
llply(ll, function(d) d %>% group_by(cyl) %>% summarise(min = min(mpg)))
The former case is apparently not even evaluated (which I figured by misspelling summarise: llply(ll, function(.) . %>% group_by(cyl) %>% sumamrise(min = min(mpg))) would not throw an error).
So this has all to do with scoping rules and where things are evaluated, but I really want to understand what is going on, and why this happens? I use . as an argument in anonymous functions quite often and I was puzzled to see the outcome.
So long story short, why does . not work with %>%?
This seems to be because of the special use of . as a placeholder when using piping. From ?"%>%":
Using the dot for secondary purposes
Often, some attribute or property
of lhs is desired in the rhs call in addition to the value of lhs
itself, e.g. the number of rows or columns. It is perfectly valid to
use the dot placeholder several times in the rhs call, but by design
the behavior is slightly different when using it inside nested
function calls. In particular, if the placeholder is only used in a
nested function call, lhs will also be placed as the first argument!
The reason for this is that in most use-cases this produces the most
readable code. For example, iris %>% subset(1:nrow(.) %% 2 == 0) is
equivalent to iris %>% subset(., 1:nrow(.) %% 2 == 0) but slightly
more compact. It is possible to overrule this behavior by enclosing
the rhs in braces. For example, 1:10 %>% {c(min(.), max(.))} is
equivalent to c(min(1:10), max(1:10)).
The . ("the dot") has multiple uses, one of which is indeed as an argument. How it's actually interpreted is highly dependent on its context -- and in your context, it's used immediately before a %>% forward-pipe operator. dplyr takes its forward-pipe operator from magrittr, and from the magrittr documentation we have the following snippet on what happens when there's a . %>% somefunction():
When the dot is used as lhs, the result will be a functional sequence, i.e. a function which applies the entire chain of right-hand sides in turn to its input.
So it's almost like an order of operations thing - a %>% immediately after the dot would interpret the dot as a part of the functional sequence.
One way to get your . understood as an argument instead is to add brackets around it, i.e.
llply(ll, function(.) (.) %>% group_by(cyl) %>% summarise(min = min(mpg)))
For a more thorough explanation of the different uses of . and %>%, and their interaction with each other, have a look at https://cran.r-project.org/web/packages/magrittr/magrittr.pdf. The relevant section starts from page 8.
I'm trying to make a function that subsets and mutates data with dplyr commands. My fake data is like this:
newTest_rv <- data.frame(is_op=c(rep(0,6),rep(1,4)),
has_click=c(0,0,1,1,1,1,0,0,1,1),
num_pimp=c(3,5,1,2,3,5,2,5,3,5),
freq = c(rep(1,5),5,1,2,1,2))
And my function is like this:
reweight <- function(data, conds){
require(dplyr)
require(lazyeval)
data %>%
filter_(lazy(conds)) %>%
group_by(num_pimp) %>%
mutate_(lazy(new_num) = lazy(num_pimp) - lazy(sum(freq[lazy(!conds)]))) %>%
mutate(new_weight=freq*(1/new_num)) %>%
ungroup()
}
> reweight(newTest_rv, is_op==0)
The non-standard evaluation with the conditional statement "is_op==0" seems to work in other places but not in the subset within a group "lazy(sum(freq[lazy(!conds)]))". Is there any way I can circumvent this problem?
Thank you!
It looks like you went a bit overboard with the lazys. The lazy() function creates a lazy object which basically delays evaluation of an expression. You can't just compose standard expressions and lazy expression. Generally you combine them via lazyeval's interp() function. I think what you want is
mutate_(new_num = interp(~num_pimp - sum(freq[!(x)]), x=lazy(conds)))
Here we use interp() to take a standard expression (in this case one that uses the formula syntax) and insert the lazy expression as a subsetting vector.
I have seen the use of %>% (percent greater than percent) function in some packages like dplyr and rvest. What does it mean? Is it a way to write closure blocks in R?
%...% operators
%>% has no builtin meaning but the user (or a package) is free to define operators of the form %whatever% in any way they like. For example, this function will return a string consisting of its left argument followed by a comma and space and then it's right argument.
"%,%" <- function(x, y) paste0(x, ", ", y)
# test run
"Hello" %,% "World"
## [1] "Hello, World"
The base of R provides %*% (matrix mulitiplication), %/% (integer division), %in% (is lhs a component of the rhs?), %o% (outer product) and %x% (kronecker product). It is not clear whether %% falls in this category or not but it represents modulo.
expm The R package, expm, defines a matrix power operator %^%. For an example see Matrix power in R .
operators The operators R package has defined a large number of such operators such as %!in% (for not %in%). See http://cran.r-project.org/web/packages/operators/operators.pdf
igraph This package defines %--% , %->% and %<-% to select edges.
lubridate This package defines %m+% and %m-% to add and subtract months and %--% to define an interval. igraph also defines %--% .
Pipes
magrittr In the case of %>% the magrittr R package has defined it as discussed in the magrittr vignette. See http://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html
magittr has also defined a number of other such operators too. See the Additional Pipe Operators section of the prior link which discusses %T>%, %<>% and %$% and http://cran.r-project.org/web/packages/magrittr/magrittr.pdf for even more details.
dplyr The dplyr R package used to define a %.% operator which is similar; however, it has been deprecated and dplyr now recommends that users use %>% which dplyr imports from magrittr and makes available to the dplyr user. As David Arenburg has mentioned in the comments this SO question discusses the differences between it and magrittr's %>% : Differences between %.% (dplyr) and %>% (magrittr)
pipeR The R package, pipeR, defines a %>>% operator that is similar to magrittr's %>% and can be used as an alternative to it. See http://renkun.me/pipeR-tutorial/
The pipeR package also has defined a number of other such operators too. See: http://cran.r-project.org/web/packages/pipeR/pipeR.pdf
postlogic The postlogic package defined %if% and %unless% operators.
wrapr The R package, wrapr, defines a dot pipe %.>% that is an explicit version of %>% in that it does not do implicit insertion of arguments but only substitutes explicit uses of dot on the right hand side. This can be considered as another alternative to %>%. See https://winvector.github.io/wrapr/articles/dot_pipe.html
Bizarro pipe. This is not really a pipe but rather some clever base syntax to work in a way similar to pipes without actually using pipes. It is discussed in http://www.win-vector.com/blog/2017/01/using-the-bizarro-pipe-to-debug-magrittr-pipelines-in-r/ The idea is that instead of writing:
1:8 %>% sum %>% sqrt
## [1] 6
one writes the following. In this case we explicitly use dot rather than eliding the dot argument and end each component of the pipeline with an assignment to the variable whose name is dot (.) . We follow that with a semicolon.
1:8 ->.; sum(.) ->.; sqrt(.)
## [1] 6
Update Added info on expm package and simplified example at top. Added postlogic package.
Update 2 The development version of R has defined a |> pipe. Unlike magrittr's %>% it can only substitute into the first argument of the right hand side. Although limited, it works via syntax transformation so it has no performance impact.
%>% is similar to pipe in Unix. For example, in
a <- combined_data_set %>% group_by(Outlet_Identifier) %>% tally()
the output of combined_data_set will go into group_by and its output will go into tally, then the final output is assigned to a.
This gives you handy and easy way to use functions in series without creating variables and storing intermediate values.
My understanding after reading the link offered by G.Grothendieck is that %>% is an operator that pipes functions. This helps readability and productivity as it's easier to follow the flow of multiple functions through these pipes than going backwards when multiple function are nested.
The R packages dplyr and sf import the operator %>% from the R package magrittr.
Help is available by using the following command:
?'%>%'
Of course the package must be loaded before by using e.g.
library(sf)
The documentation of the magrittr forward-pipe operator gives a good example:
When functions require only one argument, x %>% f is equivalent to f(x)
Another usage for %---% is the use of %<-% which means a multi-assignment operator for example:
session <- function(){
x <- 1
y <- 2
z <- y + x
list(x,y,z)
}
c(var1,var2,result) %<-% session()
I don't know much about it but I have seen it in one case study during the study of Multivariate Normal Distribution in R in my college
suppose you have a data frame in a variable called "df_gather" and you want to pipe it into a ggplot then you can use that %>%
EG:
df_gather %>% ggplot(aes(x = Value, fill = Variable, color = Variable))+
geom_density(alpha = 0.3)+ggtitle('Distibution of X')