Where is the Purrr ~ operator documented? - r

I searched for ??"~" but this only points me to rlang::env_bind (presumably, %<~%) and base::~. Within RStudio, how can I find Purrr's ~'s documentation? For example, if I forgot how to use ~ with two inputs, where do I look?

There is a good explanation given AdvanceR (link given in another answer). There is also a short description (usage example) given in purrr cheatsheat first page bottom left.
The usage of multiple arguments with twiddle ~ may be seen at with documentation of purrr given in its different functions. e.g. map see argument description which states that
.f
A function, formula, or vector (not necessarily atomic).
If a function, it is used as is.
If a formula, e.g. ~ .x + 2, it is converted to a function. There are three ways to refer to the arguments:
For a single argument function, use .
For a two argument function, use .x and .y
For more arguments, use ..1, ..2, ..3 etc
This syntax allows you to create very compact anonymous functions.
Moreover, R in its newest version (4.1.0) has also started similar kind of shorthand notation of functions
R now provides a shorthand notation for creating functions, e.g. \(x) x + 1 is parsed as function(x) x + 1.
This shorthand notation may also provide useful in functions outside tidyverse, with only differentiation from twiddle being that here arguments are not named by default. But again, this non-default naming may also be proved useful when one invisible function is to be used inside another and twiddle style of notation will not work in that case.

When you use ~ within the context of purrr functions, it will be passed to the as_mapper() function, which in turn passes on to the as_function() function from rlang. These help files have a very basic bit of what is needed to use this. This is further documented in the Advanced R Book Chapter 9, Section 9.22, which has a few good examples, and this chapter goes on to continue those ideas.

Related

R statistics programming : using magrittr piping to pass 2 parameters to function

I am using magrittr, and was able to pass one variable to an R function via pipes from magrittr, and also pick which parameter to place where in the situation of multivariable function : F(x,y,z,...)
But i want to pass 2 parameters at the same time.
For example, i will using Select function from dplyr and pass in tableName and ColumnName:
I thought i could do it like this:
tableName %>% ColumnName %>% select(.,.)
But this did not work.
Hope someone can help me on this.
EDIT :
Some below are saying that this is a duplicate of a link provided by others.
But based on the algebra structure of the magrittr definition of Pipe for multivariable functions, it should be "doable" just based on the algebra definition of the pipe function.
The link provided by others, goes beyond the base definition and employs other external functions and or libraries to try to achieve passing multiple parameter to the function.
I am looking for a solution, IF POSSIBLE, just using the magrittr library and other base operations.
So this is the restriction that is placed on this problem.
In most of my university courses in math and computer science we were restricted to use only those things taught in the course. So when I said I am using dplyr and magrittr, that should imply that those are the only things one is permitted to use, so its under this constraint.
Hope this clarifies the scope of possible solutions here.
And if it's not possible to do this via just these libraries I want someone to tell me that it cannot be done.
I think you need a little more detail about exactly what you want, but as I understand the problem, I think one solution might be:
list(x = tableName, y = "ColumnName") %>% {select(eval(.$x),.$y) }
This is just a modification of the code linked in the chat. The issue with other implementations is that the first and second inputs to select() must be of specific (and different) types. So just plugging in two strings or two objects won't work.
In the same spirit, you can also use either:
list(x = "tableName", y = "ColumnName") %>% { select(get(.$x),.$y) }
or
list(tableName, "ColumnName") %>% do.call("select", .).
Note, however, that all of these functions (i.e., get(), eval(), and do.call()) have an environment specification in them and could result in errors if improperly specified. They work just fine in these examples because everything is happening in the global environment, but that might change if they were, e.g., called in a function.

How does R ggplot2 get the column names via aes?

I understand how to use aes, but I don't understand the programmatic paradigm.
When I use ggplot, assuming I have a data.frame with column names "animal" and "weight", I can do the following.
ggplot(df, aes(x=weight)) + facet_grid(~animal) + geom_histogram()
What I don't understand is that weight and animal are not supposed to be strings, they are just typed out as is. How is it I can do that? It should be something like this instead:
ggplot(df, aes(x='weight')) + facet_grid('~animal') + geom_histogram()
I don't "declare" weight or animal as vectors anywhere? This seems to be... really unusual? Is this like a macro or something where it gets aes "whole," looks into df for its column names, and then fills in the gaps where it sees those variable names in aes?
I guess what I would like is to see some similar function in R which can take variables which are not declared in the scope, and the name of this feature, so I can read further and maybe implement my own similar functions.
In R this is called non-standard evaluation. There is a chapter on non-standard evaluation in R in the Advanced R book available free online. Basically R can look at the the call stack to see the symbol that was passed to the function rather than just the value that symbol points to. It's used a lot in base R. And it's used in a slightly different way in the tidyverse which has a formal class called a quosure to make this stuff easier to work with.
These methods are great for interactive programming. They save keystrokes and clutter, but if you make functions that are too dependent on that function, they become difficult to script or include in other functions.
The formula syntax (the one with the ~) probably the safest and more programatic way to work with symbols. It captures symbols that can be later evaluated in the context of a data.frame with functions like model.frame(). And there are build in functions to help manipulate formulas like update() and reformulate.
And since you were explicitly interested in the aes() call, you can get the source code for any function in R just by typing it's name without the quotes. With ggplot2_2.2.1, the function looks like this
aes
# function (x, y, ...)
# {
# aes <- structure(as.list(match.call()[-1]), class = "uneval")
# rename_aes(aes)
# }
# <environment: namespace:ggplot2>
The newest version of ggplot uses different rlang methods to be more consistent with other tidyverse libraries so it looks a bit different.

Function argument matching: by name vs by position

What is the difference between this lines of code?
mean(some_argument)
mean(x = some_argument)
The output is the same, but has the explicit mention of x any advantages?
People typically don't add argument names for commonly used arguments, such as the x in mean, but almost always refer to the na.rm arguments when removing missing values.
While neglecting the argument name makes for compact code, here are four (related) reasons for including the names of arguments rather than relying on their position.
Re-order arguments as needed. When you refer to the arguments by name, you can arbitrarily re-order the arguments and still produce the desired result. Sometimes it is useful to re-order your arguments. For example, when running a loop over one of the arguments, you might prefer to put the looped argument in the front of the function.
It is typically safer / more future-proof. As an example, if some user-written function or package re-orders the arguments in an update, and you relied on the positions of the arguments, this would break your code. In the best scenario, you would get an error. In the worst scenario the function would run, but would an incorrect result. Including the argument names greatly reduces the chances of running into either case.
For greater code clarity. If an argument is rarely used or you want to be explicit for future readers of your code (including you 2 months from now), adding the names can make for easier reading.
Ability to skip arguments. If you want to only change the third argument, then referring to it by name is probably preferable.
See also the R Language Definition: 4.3.2 Argument matching

What do . (dot) and % (percentage) mean in R?

My question might sound stupid but I have noticed that . and % is often used in R and to be frank I don't really know why it is used.
I have seen it in dplyr (go here for an example) and data.table (i.e. .SD) but I am sure it must be used in other place as well.
Therefore, my question is:
What does . mean? Is it some kind of R coding best practice nomenclature? (i.e. _functionName is often used in javascript to indicate it is a private function). If yes, what's the rule?
Same question for %, which is also often used in R (i.e. %in%,%>%,...).
My guess always has been that . and % are a convenient way to quickly call function but the way data.table uses . does not follow this logic, which confuses me.
. has no inherent/magical meaning in R. It's just another character that you can use in symbol names. But because it is so convenient to type, it has been given special meaning by certain functions and conventions in R. Here are just a few
. is used look up S3 generic method implementations. For example, if you call a generic function like plot with an object of class lm as the first parameter, then it will look for a function named plot.lm and, if found, call that.
often . in formulas means "all other variables", for example lm(y~., data=dd) will regress y on all the other variables in the data.frame dd.
libraries like dplyr use it as a special variable name to indicate the current data.frame for methods like do(). They could just as easily have chosen to use the variable name X instead
functions like bquote use .() as a special function to escape variables in expressions
variables that start with a period are considered "hidden" and will not show up with ls() unless you call ls(all.names=TRUE) (similar to the UNIX file system behavior)
However, you can also just define a variable named my.awesome.variable<-42 and it will work just like any other variable.
A % by itself doesn't mean anything special, but R allows you to define your own infix operators in the form %<something>% using two percent signs. If you define
`%myfun%` <- function(a,b) {
a*3-b*2
}
you can call it like
5 %myfun% 2
# [1] 11
MrFlick's answer doesn't cover the usage of . in data.table;
In data.table, . is (essentially) an alias for list, so any* call to [.data.table that accepts a list can also be passed an object wrapped in .().
So the following are equivalent:
DT[ , .(x, y)]
DT[ , list(x, y)]
*well, not quite. any use in the j argument, yes; elsewhere is a work in progress, see here.

lapply-ing with the "$" function

I was going through some examples in hadley's guide to functionals, and came across an unexpected problem.
Suppose I have a list of model objects,
x=1:3;y=3:1; bah <- list(lm(x~y),lm(y~x))
and want to extract something from each (as suggested in hadley's question about a list called "trials"). I was expecting one of these to work:
lapply(bah,`$`,i='call') # or...
lapply(bah,`$`,call)
However, these return nulls. It seems like I'm not misusing the $ function, as these things work:
`$`(bah[[1]],i='call')
`$`(bah[[1]],call)
Anyway, I'm just doing this as an exercise and am curious where my mistake is. I know I could use an anonymous function, but think there must be a way to use syntax similar to my initial non-solution. I've looked through the places $ is mentioned in ?Extract, but didn't see any obvious explanation.
I just realized that this works:
lapply(bah,`[[`,i='call')
and this
lapply(bah,function(x)`$`(x,call))
Maybe this just comes down to some lapply voodoo that demands anonymous functions where none should be needed? I feel like I've heard that somewhere on SO before.
This is documented in ?lapply, in the "Note" section (emphasis mine):
For historical reasons, the calls created by lapply are unevaluated,
and code has been written (e.g. bquote) that relies on this. This
means that the recorded call is always of the form FUN(X[[0L]],
...), with 0L replaced by the current integer index. This is not
normally a problem, but it can be if FUN uses sys.call or
match.call or if it is a primitive function that makes use of the
call. This means that it is often safer to call primitive functions
with a wrapper, so that e.g. lapply(ll, function(x) is.numeric(x))
is required in R 2.7.1 to ensure that method dispatch for is.numeric
occurs correctly.

Resources