Why is := allowed as an infix operator? - r

I have come across the popular data.table package and one thing in particular intrigued me. It has an in-place assignment operator
:=
This is not defined in base R. In fact if you didn't load the data.table package, it would have raised an error if you had tried to used it (e.g., a := 2) with the message:
Error: could not find function ":="
Also, why does := work? Why does R let you define := as infix operator while every other infix function has to be surrounded by %%, e.g.
`:=` <- function(a, b) {
paste(a,b)
}
"abc" := "def"
Clearly it's not meant to be an alternative syntax to %function.name% for defining infix functions. Is data.table exploiting some parsing quirks of R? Is it a hack? Will it be "patched" in the future?

It is something that the base R parser recognizes and seems to parse as a left assign (at least in terms or order of operations and such). See the C source code for more details.
as.list(parse(text="a:=3")[[1]])
# [[1]]
# `:=`
#
# [[2]]
# a
#
# [[3]]
# [1] 3
As far as I can tell it's undocumented (as far as base R is concerned). But it is a function/operator you can change the behavior of
`:=`<-function(a,b) {a+b}
3 := 7
# [1] 10
As you can see there really isn't anything special about the ":" part itself. It just happens to be the start of a compound token.

It's not just a colon operator but rather := is a single operator formed by the colon and equal sign (just as the combination of "<" and "-" forms the assignment operator in base R). The := operator is an infix function that is defined to be part of the evaluation of the "j" argument inside the [.data.table function. It creates or assigns a value to a column designated by its LHS argument using the result of evaluating its RHS.

Related

Create Function with special character R

I would like to create a function with the percentage sign in r. Something similar to the pipe operator in magrittr ($>$).
Here the code
%|%(x) <- function(x){...}
Unfortunately I received the following error:
Error: unexpected SPECIAL in "%|%"
Is there anything I am missing?
Thank you for your help
Syntactically invalid names need to be wrapped in backticks (`…`) to be used in code. This includes operators when using them as regular R names rather than infix operators. This is the case when you want to define them:
`%|%` <- function(a, b) a + b
It’s also the case when you want to pass them into a higher-order function such as sapply:
sapply(1 : 5, `-`)
# [1] -1 -2 -3 -4 -5
(Of course this particular example is pretty useless since most operators are vectorised so you could just write - (1 : 5) instead of the above.)
You might also see code that uses quotes instead of backticks but this is discouraged.

If `[` is a function for subsetting in R, what is `]`?

I'm reading the advanced R introduction by Hadley Wickham, where he states that [ (and +, -, {, etc) are functions, so that [ can be used in this manner
> x <- list(1:3, 4:9, 10:12)
> sapply(x, "[", 2)
[1] 2 5 11
Which is perfectly fine and understandable. But if [ is the function required to subset, does ] have another use rather than a syntactical one?
I found that:
> `]`
Error: object ']' not found
so I assume there is no other use for it?
This is the fundamental difference between syntax and semantics. Semantics require that — in R — things like subsetting and if etc are functions. That’s why R defines functions `[`, `if` etc.
And then there’s syntax. And R’s syntax dictates that the syntax for if is either if (condition) expression or if (condition) expression else expression. Likewise, the syntax for subsetting in R is obj[args…]. That is, ] is simply a syntactic element and it has no semantic equivalent, no corresponding function (same as else).
To make this perhaps even clearer:
[ and ] are syntactic elements in R that delimit a subsetting expression.
By contrast, `[` (note the backticks!) is a function that implements the subsetting operation.
Somehow though, I was expecting ] to be a syntactical element, by default: indexing from the end. So I define it myself in my code:
"]" <- function(x,y) if (y <= length(x)) x[length(x)+1-y] else NA
With the given example, then:
sapply(x, "]", 1)
[1] 3 9 12
sapply(x, "]", 2)
[1] 2 8 11

How to "rollback" after doing `+` = `-`?

Main Question
I did
> `+` = `-`
> 5 + 2
[1] 3
How can I "rollback" without restarting the console? Doing
> `+` = sum
of course, restores one function of + but not all. For example
> c(3,4) + c(1,2)
[1] 10
How could I restore other functions of +?
Extra related questions
Is there a name for this "kind of assignment" or the kind of functions that "+" and "-" represent?
What terms can be used to differentiate the function "+" from the function "%+%" that one could create doing
`%+%` = function(x,y){print(paste(x,"+",y,"=",x+y))}
rm() removes an object from your workspace.
rm(`+`)
will remove your custom definition that masks the built-in function.
There is nothing special about the assignment you did. As nrussell points out, infix operators (aka binary operators) are generally possible to define by wrapping them in percent signs. The basic math ones (+, -, *, /, ^, even = and <- and logical operators, ==, |, ||, &, &&, <, etc.) are special in that the parser knows they're binary operators even without being wrapped in %. You can see ?Arithmetic (alias ?"+") and ?base::Ops for more details.
You can override this by fully qualifying the function in reassignment:
`+` = `-`
5 + 2
#[1] 3
`+` <- base::`+`
5 + 2
#[1] 7
It's probably better to just rm the new function though, as Gregor suggests, otherwise you will just have extra object floating around your environment needlessly.
Functions such as +, -, *, etc., and even %+% are called infix operators. The difference is that the former are built into the R language (they are primitives), and therefore do not need to be wrapped in % % to avoid generating a parsing error.

The function of parentheses (round brackets) in R

How does R interpret parentheses? Like most other programming languages these are built-in operators, and I normally use them without thinking.
However, I came across this example. Let's say we have a data.table in R, and I would like to apply a function on it's columns. Then I might write:
dt <- data.table(my_data)
important_cols <- c("col1", "col2", "col5")
dt[, (important_cols) := lapply(.SD, my_func), .SDcols = important_cols]
Obviously I can't neglect the parentheses:
dt[, important_cols := lapply(.SD, my_func), .SDcols = important_cols]
as that would introduce a new object called important_cols to my data.table, instead of modifying my existing columns in place.
My question is, why does putting ( ) around the vector "expand" it?
This question can probably better phrased and titled. But then I would have probably found the answer by Googling if I knew the terminology to employ while asking it, hence I'm here.
While we're on that topic, if someone could point out the differences between [ ], { }, etc., and how they should be used, that would be appreciated too :)
A special feature of R (compared to e.g. C++) is that the various parentheses are actually functions. What this means is that (a) and a are different expressions. The second is just a, while the first is the function ( called with an argument a. Here are a few expressions trees for you to compare:
as.list(substitute( a ))
#[[1]]
#a
as.list(substitute( (a) ))
#[[1]]
#`(`
#
#[[2]]
#a
as.list(substitute( sqrt(a) ))
#[[1]]
#sqrt
#
#[[2]]
#a
Notice how similar the last trees are - in one the function is sqrt, in the other it's "(". In most places in R, the "(" function doesn't do anything, it just returns the same expression, but in the particular case of data.table, it is "overridden" (in quotes because that's not exactly how it's done, but in spirit it is) to do a variety of useful operations.
And here's one more demo to hopefully cement the point:
`(` = function(x) x*x
2
#[1] 2
(2)
#[1] 4
((2))
#[1] 16

R: What are operators like %in% called and how can I learn about them?

I know the basics like == and !=, or even the difference (vaguely) between & and &&. But stuff like %in% and %% and some stuff used in the context of sprintf(), like sprintf("%.2f", x) stuff I have no idea about.
Worst of all, they're hard to search for on the Internet because they're special characters and I don't know what they're called...
There are several different things going on here with the percent symbol:
Binary Operators
As several have already pointed out, things of the form %%, %in%, %*% are binary operators (respectively modulo, match, and matrix multiply), just like a +, -, etc. They are functions that operate on two arguments that R recognizes as being special due to their name structure (starts and ends with a %). This allows you to use them in form:
Argument1 %fun_name% Argument2
instead of the more traditional:
fun_name(Argument1, Argument2)
Keep in mind that the following are equivalent:
10 %% 2 == `%%`(10, 2)
"hello" %in% c("hello", "world") == `%in%`("hello", c("hello", "world"))
10 + 2 == `+`(10, 2)
R just recognizes the standard operators as well as the %x% operators as special and allows you to use them as traditional binary operators if you don't quote them. If you quote them (in the examples above with backticks), you can use them as standard two argument functions.
Custom Binary Operators
The big difference between the standard binary operators and %x% operators is that you can define custom binary operators and R will recognize them as special and treat them as binary operators:
`%samp%` <- function(e1, e2) sample(e1, e2)
1:10 %samp% 2
# [1] 1 9
Here we defined a binary operator version of the sample function
"%" (Percent) as a token in special function
The meaning of "%" in function like sprintf or format is completely different and has nothing to do with binary operators. The key thing to note is that in those functions the % character is part of a quoted string, and not a standard symbol on the command line (i.e. "%" and % are very different). In the context of sprintf, inside a string, "%" is a special character used to recognize that the subsequent characters have a special meaning and should not be interpreted as regular text. For example, in:
sprintf("I'm a number: %.2f", runif(3))
# [1] "I'm a number: 0.96" "I'm a number: 0.74" "I'm a number: 0.99"
"%.2f" means a floating point number (f) to be displayed with two decimals (.2). Notice how the "I'm a number: " piece is interpreted literally. The use of "%" allows sprintf users to mix literal text with special instructions on how to represent the other sprintf arguments.
The R Language Definition, section 3.1.4 refers to them as "special binary operators". One of the ways they're special is that users can define new binary operators using the %x% syntax (where x is any valid name).
The Writing your own functions section of An Introduction to R, refers to them as Binary Operators (which is somewhat confusing because + is also a binary operator):
10.2 Defining new binary operators
Had we given the bslash() function a different name, namely one of the
form
%anything%
it could have been used as a binary operator in expressions
rather than in function form. Suppose, for example, we choose ! for
the internal character. The function definition would then start as
> "%!%" <- function(X, y) { ... }
(Note the use of quote marks.) The function could then be used as X %!% y. (The backslash symbol itself
is not a convenient choice as it presents special problems in this
context.)
The matrix multiplication operator, %*%, and the outer product matrix
operator %o% are other examples of binary operators defined in this
way.
They don’t have a special name as far as I know. They are described in R operator syntax and precedence.
The %anything% operators are just normal functions, which can be defined by yourself. You do need to put the name of the operator in backticks (`…`), though: this is how R treats special names.
`%test%` = function (a, b) a * b
2 %test% 4
# 8
The sprintf format strings are entirely unrelated, they are not operators at all. Instead, they are just the conventional C-style format strings.
The help file, and the general entry, is indeed a good starting point: ?'%in%'
For example, you can see how the operator '%in%' is defined:
"%in%" <- function(x, table) match(x, table, nomatch = 0) > 0
You can even create your own operators:
'%ni%' <- Negate('%in%')

Resources