Understanding the event and evidence parameters in bnlearns cpquery - r

I followed the example for cpquery(fitted, event, evidence, cluster, method = "ls", ..., debug = FALSE) given in the documentation:
#rm(list=ls())
library(bnlearn)
data(learning.test)
fitted = bn.fit(hc(learning.test), learning.test)
# the result should be around 0.025.
cpquery(fitted, (B == "b"), (A == "a"))
# out: 0.02692665
It worked as advertised, but I have some confusion about the syntax. For reference, head(learning.test) looks like this:
A B C D E F
1 b c b a b b
2 b a c a b b
3 a a a a a a
4 a a a a b b
5 a a b c a a
6 c c a c c a
and the total number of rows is 5000.
Now to my confusion: The second argument to cpquery, (B == "b"), is passed to the event parameter and the third argument (A == "a") is passed to the evidence parameter. The documentation says about these two parameters that
The event and evidence arguments must be two expressions describing the event of interest and the conditioning evidence in a format such that, if we denote with data the data set the network was learned from, data[evidence, ] and data[event, ] return the correct observations.
I didn't quite understand how the syntax (B == "b") worked, since I thought B is not defined in the current scope. So given what was said in the documentation above, I took data to be learning.test in my case and then I tried to do learning.test[(B == "b"),], which indeed resulted in the error
Error in `[.data.frame`(learning.test, (B == "b"), ) :
object 'B' not found
Given the documentation I thought this line of code should've worked.
I then tried to do learning.test[(learning.test$B=="b"),] which does seem to return a data frame with all rows where the B-column has the value "b". I then tried to use this syntax in the cpquery function:
cpquery(fitted, (learning.test$B == "b"), (learning.test$A == "a"))
but to my surprise this resulted in the error
Error in sampling(fitted = fitted, event = event, evidence = evidence, :
logical vector for evidence is of length 5000 instead of 10000.
I don't understand why I get this error (other than that something goes wrong in the logical sampling used by cpquery), since as far as I can see I am conforming to the quoted passage from the documentation.
So in conclusion I wonder why does the example from the documentation work when it seems to not conform to the quoted passage from the documentation, and why does my modification not work even though it seems to conform to the quoted passage in the documentation? (I am not very familiary with R so maybe this is very basic).

I think that this section is just trying to show what the evidence and event expressions represent in the original data and how they relate to logic sampling. e.g. if you subset the original data with the event/evidence, then it will subset the data to only contains rows that agree with the event/evidence.
But you are correct that the notation data[evidence, ] won't work. You can uselearning.test[eval(expression(B == "b"), learning.test), ], which may help you understand the intent. This is similar to what used to appear in older help files.
For logic sampling (the default inference method) a bunch of samples are generated from the BN, the default is N=10000 samples (*). So 1) you have estimated a parameterised BN, 2) draw N samples from it, 3) subset this sample according to event / evidence vectors to calculate some conditional probability (hopefully correctness is not lost in my brevity).
cpquery(fitted, (learning.test$B == "b"), (learning.test$A == "a")) doesn't work as the default number of samples in the sampled BN (step 2. above) is 10000 but learning.test has 5000 rows (and so the logical vector produced by (learning.test$B == "b") has length 5000). You can force your approach to work, (by work I mean produce an answer), by setting the n parameter , cpquery(fitted, (learning.test$B == "b"), (learning.test$A == "a"), n=5000) but this will not always do what you expect: you are evaluating the event/evidence expression against the observed data instead of the simulated data. I think that if this sort of logical vector is supplied then cpquery should probably throw an error. Best to stick with the documented approaches.
(*) From the help I'd expect the default number of samples to be 5000 * log10(nparams(fitted))) =~ 8000, but perhaps batch = 10000 takes precedence.

Related

What does the operator "<<-" do when used to alter dataframes inside user-defined functions? [duplicate]

What are the differences between the assignment operators = and <- in R?
I know that operators are slightly different, as this example shows
x <- y <- 5
x = y = 5
x = y <- 5
x <- y = 5
# Error in (x <- y) = 5 : could not find function "<-<-"
But is this the only difference?
The difference in assignment operators is clearer when you use them to set an argument value in a function call. For example:
median(x = 1:10)
x
## Error: object 'x' not found
In this case, x is declared within the scope of the function, so it does not exist in the user workspace.
median(x <- 1:10)
x
## [1] 1 2 3 4 5 6 7 8 9 10
In this case, x is declared in the user workspace, so you can use it after the function call has been completed.
There is a general preference among the R community for using <- for assignment (other than in function signatures) for compatibility with (very) old versions of S-Plus. Note that the spaces help to clarify situations like
x<-3
# Does this mean assignment?
x <- 3
# Or less than?
x < -3
Most R IDEs have keyboard shortcuts to make <- easier to type. Ctrl + = in Architect, Alt + - in RStudio (Option + - under macOS), Shift + - (underscore) in emacs+ESS.
If you prefer writing = to <- but want to use the more common assignment symbol for publicly released code (on CRAN, for example), then you can use one of the tidy_* functions in the formatR package to automatically replace = with <-.
library(formatR)
tidy_source(text = "x=1:5", arrow = TRUE)
## x <- 1:5
The answer to the question "Why does x <- y = 5 throw an error but not x <- y <- 5?" is "It's down to the magic contained in the parser". R's syntax contains many ambiguous cases that have to be resolved one way or another. The parser chooses to resolve the bits of the expression in different orders depending on whether = or <- was used.
To understand what is happening, you need to know that assignment silently returns the value that was assigned. You can see that more clearly by explicitly printing, for example print(x <- 2 + 3).
Secondly, it's clearer if we use prefix notation for assignment. So
x <- 5
`<-`(x, 5) #same thing
y = 5
`=`(y, 5) #also the same thing
The parser interprets x <- y <- 5 as
`<-`(x, `<-`(y, 5))
We might expect that x <- y = 5 would then be
`<-`(x, `=`(y, 5))
but actually it gets interpreted as
`=`(`<-`(x, y), 5)
This is because = is lower precedence than <-, as shown on the ?Syntax help page.
What are the differences between the assignment operators = and <- in R?
As your example shows, = and <- have slightly different operator precedence (which determines the order of evaluation when they are mixed in the same expression). In fact, ?Syntax in R gives the following operator precedence table, from highest to lowest:
…
‘-> ->>’ rightwards assignment
‘<- <<-’ assignment (right to left)
‘=’ assignment (right to left)
…
But is this the only difference?
Since you were asking about the assignment operators: yes, that is the only difference. However, you would be forgiven for believing otherwise. Even the R documentation of ?assignOps claims that there are more differences:
The operator <- can be used anywhere,
whereas the operator = is only allowed at the top level (e.g.,
in the complete expression typed at the command prompt) or as one
of the subexpressions in a braced list of expressions.
Let’s not put too fine a point on it: the R documentation is wrong. This is easy to show: we just need to find a counter-example of the = operator that isn’t (a) at the top level, nor (b) a subexpression in a braced list of expressions (i.e. {…; …}). — Without further ado:
x
# Error: object 'x' not found
sum((x = 1), 2)
# [1] 3
x
# [1] 1
Clearly we’ve performed an assignment, using =, outside of contexts (a) and (b). So, why has the documentation of a core R language feature been wrong for decades?
It’s because in R’s syntax the symbol = has two distinct meanings that get routinely conflated (even by experts, including in the documentation cited above):
The first meaning is as an assignment operator. This is all we’ve talked about so far.
The second meaning isn’t an operator but rather a syntax token that signals named argument passing in a function call. Unlike the = operator it performs no action at runtime, it merely changes the way an expression is parsed.
So how does R decide whether a given usage of = refers to the operator or to named argument passing? Let’s see.
In any piece of code of the general form …
‹function_name›(‹argname› = ‹value›, …)
‹function_name›(‹args›, ‹argname› = ‹value›, …)
… the = is the token that defines named argument passing: it is not the assignment operator. Furthermore, = is entirely forbidden in some syntactic contexts:
if (‹var› = ‹value›) …
while (‹var› = ‹value›) …
for (‹var› = ‹value› in ‹value2›) …
for (‹var1› in ‹var2› = ‹value›) …
Any of these will raise an error “unexpected '=' in ‹bla›”.
In any other context, = refers to the assignment operator call. In particular, merely putting parentheses around the subexpression makes any of the above (a) valid, and (b) an assignment. For instance, the following performs assignment:
median((x = 1 : 10))
But also:
if (! (nf = length(from))) return()
Now you might object that such code is atrocious (and you may be right). But I took this code from the base::file.copy function (replacing <- with =) — it’s a pervasive pattern in much of the core R codebase.
The original explanation by John Chambers, which the the R documentation is probably based on, actually explains this correctly:
[= assignment is] allowed in only two places in the grammar: at the top level (as a complete program or user-typed expression); and when isolated from surrounding logical structure, by braces or an extra pair of parentheses.
In sum, by default the operators <- and = do the same thing. But either of them can be overridden separately to change its behaviour. By contrast, <- and -> (left-to-right assignment), though syntactically distinct, always call the same function. Overriding one also overrides the other. Knowing this is rarely practical but it can be used for some fun shenanigans.
Google's R style guide simplifies the issue by prohibiting the "=" for assignment. Not a bad choice.
https://google.github.io/styleguide/Rguide.xml
The R manual goes into nice detail on all 5 assignment operators.
http://stat.ethz.ch/R-manual/R-patched/library/base/html/assignOps.html
x = y = 5 is equivalent to x = (y = 5), because the assignment operators "group" right to left, which works. Meaning: assign 5 to y, leaving the number 5; and then assign that 5 to x.
This is not the same as (x = y) = 5, which doesn't work! Meaning: assign the value of y to x, leaving the value of y; and then assign 5 to, umm..., what exactly?
When you mix the different kinds of assignment operators, <- binds tighter than =. So x = y <- 5 is interpreted as x = (y <- 5), which is the case that makes sense.
Unfortunately, x <- y = 5 is interpreted as (x <- y) = 5, which is the case that doesn't work!
See ?Syntax and ?assignOps for the precedence (binding) and grouping rules.
According to John Chambers, the operator = is only allowed at "the top level," which means it is not allowed in control structures like if, making the following programming error illegal.
> if(x = 0) 1 else x
Error: syntax error
As he writes, "Disallowing the new assignment form [=] in control expressions avoids programming errors (such as the example above) that are more likely with the equal operator than with other S assignments."
You can manage to do this if it's "isolated from surrounding logical structure, by braces or an extra pair of parentheses," so if ((x = 0)) 1 else x would work.
See http://developer.r-project.org/equalAssign.html
From the official R documentation:
The operators <- and = assign into the environment in which they
are evaluated. The operator <- can be used anywhere, whereas the
operator = is only allowed at the top level (e.g., in the
complete expression typed at the command prompt) or as one of the
subexpressions in a braced list of expressions.
This may also add to understanding of the difference between those two operators:
df <- data.frame(
a = rnorm(10),
b <- rnorm(10)
)
For the first element R has assigned values and proper name, while the name of the second element looks a bit strange.
str(df)
# 'data.frame': 10 obs. of 2 variables:
# $ a : num 0.6393 1.125 -1.2514 0.0729 -1.3292 ...
# $ b....rnorm.10.: num 0.2485 0.0391 -1.6532 -0.3366 1.1951 ...
R version 3.3.2 (2016-10-31); macOS Sierra 10.12.1
I am not sure if Patrick Burns book R inferno has been cited here where in 8.2.26 = is not a synonym of <- Patrick states "You clearly do not want to use '<-' when you want to set an argument of a function.". The book is available at https://www.burns-stat.com/documents/books/the-r-inferno/
There are some differences between <- and = in the past version of R or even the predecessor language of R (S language). But currently, it seems using = only like any other modern language (python, java) won't cause any problem. You can achieve some more functionality by using <- when passing a value to some augments while also creating a global variable at the same time but it may have weird/unwanted behavior like in
df <- data.frame(
a = rnorm(10),
b <- rnorm(10)
)
str(df)
# 'data.frame': 10 obs. of 2 variables:
# $ a : num 0.6393 1.125 -1.2514 0.0729 -1.3292 ...
# $ b....rnorm.10.: num 0.2485 0.0391 -1.6532 -0.3366 1.1951 ...
Highly recommended! Try to read this article which is the best article that tries to explain the difference between those two:
Check https://colinfay.me/r-assignment/
Also, think about <- as a function that invisibly returns a value.
a <- 2
(a <- 2)
#> [1] 2
See: https://adv-r.hadley.nz/functions.html

R data.frame colnames depending on assignment operator [duplicate]

What are the differences between the assignment operators = and <- in R?
I know that operators are slightly different, as this example shows
x <- y <- 5
x = y = 5
x = y <- 5
x <- y = 5
# Error in (x <- y) = 5 : could not find function "<-<-"
But is this the only difference?
The difference in assignment operators is clearer when you use them to set an argument value in a function call. For example:
median(x = 1:10)
x
## Error: object 'x' not found
In this case, x is declared within the scope of the function, so it does not exist in the user workspace.
median(x <- 1:10)
x
## [1] 1 2 3 4 5 6 7 8 9 10
In this case, x is declared in the user workspace, so you can use it after the function call has been completed.
There is a general preference among the R community for using <- for assignment (other than in function signatures) for compatibility with (very) old versions of S-Plus. Note that the spaces help to clarify situations like
x<-3
# Does this mean assignment?
x <- 3
# Or less than?
x < -3
Most R IDEs have keyboard shortcuts to make <- easier to type. Ctrl + = in Architect, Alt + - in RStudio (Option + - under macOS), Shift + - (underscore) in emacs+ESS.
If you prefer writing = to <- but want to use the more common assignment symbol for publicly released code (on CRAN, for example), then you can use one of the tidy_* functions in the formatR package to automatically replace = with <-.
library(formatR)
tidy_source(text = "x=1:5", arrow = TRUE)
## x <- 1:5
The answer to the question "Why does x <- y = 5 throw an error but not x <- y <- 5?" is "It's down to the magic contained in the parser". R's syntax contains many ambiguous cases that have to be resolved one way or another. The parser chooses to resolve the bits of the expression in different orders depending on whether = or <- was used.
To understand what is happening, you need to know that assignment silently returns the value that was assigned. You can see that more clearly by explicitly printing, for example print(x <- 2 + 3).
Secondly, it's clearer if we use prefix notation for assignment. So
x <- 5
`<-`(x, 5) #same thing
y = 5
`=`(y, 5) #also the same thing
The parser interprets x <- y <- 5 as
`<-`(x, `<-`(y, 5))
We might expect that x <- y = 5 would then be
`<-`(x, `=`(y, 5))
but actually it gets interpreted as
`=`(`<-`(x, y), 5)
This is because = is lower precedence than <-, as shown on the ?Syntax help page.
What are the differences between the assignment operators = and <- in R?
As your example shows, = and <- have slightly different operator precedence (which determines the order of evaluation when they are mixed in the same expression). In fact, ?Syntax in R gives the following operator precedence table, from highest to lowest:
…
‘-> ->>’ rightwards assignment
‘<- <<-’ assignment (right to left)
‘=’ assignment (right to left)
…
But is this the only difference?
Since you were asking about the assignment operators: yes, that is the only difference. However, you would be forgiven for believing otherwise. Even the R documentation of ?assignOps claims that there are more differences:
The operator <- can be used anywhere,
whereas the operator = is only allowed at the top level (e.g.,
in the complete expression typed at the command prompt) or as one
of the subexpressions in a braced list of expressions.
Let’s not put too fine a point on it: the R documentation is wrong. This is easy to show: we just need to find a counter-example of the = operator that isn’t (a) at the top level, nor (b) a subexpression in a braced list of expressions (i.e. {…; …}). — Without further ado:
x
# Error: object 'x' not found
sum((x = 1), 2)
# [1] 3
x
# [1] 1
Clearly we’ve performed an assignment, using =, outside of contexts (a) and (b). So, why has the documentation of a core R language feature been wrong for decades?
It’s because in R’s syntax the symbol = has two distinct meanings that get routinely conflated (even by experts, including in the documentation cited above):
The first meaning is as an assignment operator. This is all we’ve talked about so far.
The second meaning isn’t an operator but rather a syntax token that signals named argument passing in a function call. Unlike the = operator it performs no action at runtime, it merely changes the way an expression is parsed.
So how does R decide whether a given usage of = refers to the operator or to named argument passing? Let’s see.
In any piece of code of the general form …
‹function_name›(‹argname› = ‹value›, …)
‹function_name›(‹args›, ‹argname› = ‹value›, …)
… the = is the token that defines named argument passing: it is not the assignment operator. Furthermore, = is entirely forbidden in some syntactic contexts:
if (‹var› = ‹value›) …
while (‹var› = ‹value›) …
for (‹var› = ‹value› in ‹value2›) …
for (‹var1› in ‹var2› = ‹value›) …
Any of these will raise an error “unexpected '=' in ‹bla›”.
In any other context, = refers to the assignment operator call. In particular, merely putting parentheses around the subexpression makes any of the above (a) valid, and (b) an assignment. For instance, the following performs assignment:
median((x = 1 : 10))
But also:
if (! (nf = length(from))) return()
Now you might object that such code is atrocious (and you may be right). But I took this code from the base::file.copy function (replacing <- with =) — it’s a pervasive pattern in much of the core R codebase.
The original explanation by John Chambers, which the the R documentation is probably based on, actually explains this correctly:
[= assignment is] allowed in only two places in the grammar: at the top level (as a complete program or user-typed expression); and when isolated from surrounding logical structure, by braces or an extra pair of parentheses.
In sum, by default the operators <- and = do the same thing. But either of them can be overridden separately to change its behaviour. By contrast, <- and -> (left-to-right assignment), though syntactically distinct, always call the same function. Overriding one also overrides the other. Knowing this is rarely practical but it can be used for some fun shenanigans.
Google's R style guide simplifies the issue by prohibiting the "=" for assignment. Not a bad choice.
https://google.github.io/styleguide/Rguide.xml
The R manual goes into nice detail on all 5 assignment operators.
http://stat.ethz.ch/R-manual/R-patched/library/base/html/assignOps.html
x = y = 5 is equivalent to x = (y = 5), because the assignment operators "group" right to left, which works. Meaning: assign 5 to y, leaving the number 5; and then assign that 5 to x.
This is not the same as (x = y) = 5, which doesn't work! Meaning: assign the value of y to x, leaving the value of y; and then assign 5 to, umm..., what exactly?
When you mix the different kinds of assignment operators, <- binds tighter than =. So x = y <- 5 is interpreted as x = (y <- 5), which is the case that makes sense.
Unfortunately, x <- y = 5 is interpreted as (x <- y) = 5, which is the case that doesn't work!
See ?Syntax and ?assignOps for the precedence (binding) and grouping rules.
According to John Chambers, the operator = is only allowed at "the top level," which means it is not allowed in control structures like if, making the following programming error illegal.
> if(x = 0) 1 else x
Error: syntax error
As he writes, "Disallowing the new assignment form [=] in control expressions avoids programming errors (such as the example above) that are more likely with the equal operator than with other S assignments."
You can manage to do this if it's "isolated from surrounding logical structure, by braces or an extra pair of parentheses," so if ((x = 0)) 1 else x would work.
See http://developer.r-project.org/equalAssign.html
From the official R documentation:
The operators <- and = assign into the environment in which they
are evaluated. The operator <- can be used anywhere, whereas the
operator = is only allowed at the top level (e.g., in the
complete expression typed at the command prompt) or as one of the
subexpressions in a braced list of expressions.
This may also add to understanding of the difference between those two operators:
df <- data.frame(
a = rnorm(10),
b <- rnorm(10)
)
For the first element R has assigned values and proper name, while the name of the second element looks a bit strange.
str(df)
# 'data.frame': 10 obs. of 2 variables:
# $ a : num 0.6393 1.125 -1.2514 0.0729 -1.3292 ...
# $ b....rnorm.10.: num 0.2485 0.0391 -1.6532 -0.3366 1.1951 ...
R version 3.3.2 (2016-10-31); macOS Sierra 10.12.1
I am not sure if Patrick Burns book R inferno has been cited here where in 8.2.26 = is not a synonym of <- Patrick states "You clearly do not want to use '<-' when you want to set an argument of a function.". The book is available at https://www.burns-stat.com/documents/books/the-r-inferno/
There are some differences between <- and = in the past version of R or even the predecessor language of R (S language). But currently, it seems using = only like any other modern language (python, java) won't cause any problem. You can achieve some more functionality by using <- when passing a value to some augments while also creating a global variable at the same time but it may have weird/unwanted behavior like in
df <- data.frame(
a = rnorm(10),
b <- rnorm(10)
)
str(df)
# 'data.frame': 10 obs. of 2 variables:
# $ a : num 0.6393 1.125 -1.2514 0.0729 -1.3292 ...
# $ b....rnorm.10.: num 0.2485 0.0391 -1.6532 -0.3366 1.1951 ...
Highly recommended! Try to read this article which is the best article that tries to explain the difference between those two:
Check https://colinfay.me/r-assignment/
Also, think about <- as a function that invisibly returns a value.
a <- 2
(a <- 2)
#> [1] 2
See: https://adv-r.hadley.nz/functions.html

R dplyr mutate colnames [duplicate]

What are the differences between the assignment operators = and <- in R?
I know that operators are slightly different, as this example shows
x <- y <- 5
x = y = 5
x = y <- 5
x <- y = 5
# Error in (x <- y) = 5 : could not find function "<-<-"
But is this the only difference?
The difference in assignment operators is clearer when you use them to set an argument value in a function call. For example:
median(x = 1:10)
x
## Error: object 'x' not found
In this case, x is declared within the scope of the function, so it does not exist in the user workspace.
median(x <- 1:10)
x
## [1] 1 2 3 4 5 6 7 8 9 10
In this case, x is declared in the user workspace, so you can use it after the function call has been completed.
There is a general preference among the R community for using <- for assignment (other than in function signatures) for compatibility with (very) old versions of S-Plus. Note that the spaces help to clarify situations like
x<-3
# Does this mean assignment?
x <- 3
# Or less than?
x < -3
Most R IDEs have keyboard shortcuts to make <- easier to type. Ctrl + = in Architect, Alt + - in RStudio (Option + - under macOS), Shift + - (underscore) in emacs+ESS.
If you prefer writing = to <- but want to use the more common assignment symbol for publicly released code (on CRAN, for example), then you can use one of the tidy_* functions in the formatR package to automatically replace = with <-.
library(formatR)
tidy_source(text = "x=1:5", arrow = TRUE)
## x <- 1:5
The answer to the question "Why does x <- y = 5 throw an error but not x <- y <- 5?" is "It's down to the magic contained in the parser". R's syntax contains many ambiguous cases that have to be resolved one way or another. The parser chooses to resolve the bits of the expression in different orders depending on whether = or <- was used.
To understand what is happening, you need to know that assignment silently returns the value that was assigned. You can see that more clearly by explicitly printing, for example print(x <- 2 + 3).
Secondly, it's clearer if we use prefix notation for assignment. So
x <- 5
`<-`(x, 5) #same thing
y = 5
`=`(y, 5) #also the same thing
The parser interprets x <- y <- 5 as
`<-`(x, `<-`(y, 5))
We might expect that x <- y = 5 would then be
`<-`(x, `=`(y, 5))
but actually it gets interpreted as
`=`(`<-`(x, y), 5)
This is because = is lower precedence than <-, as shown on the ?Syntax help page.
What are the differences between the assignment operators = and <- in R?
As your example shows, = and <- have slightly different operator precedence (which determines the order of evaluation when they are mixed in the same expression). In fact, ?Syntax in R gives the following operator precedence table, from highest to lowest:
…
‘-> ->>’ rightwards assignment
‘<- <<-’ assignment (right to left)
‘=’ assignment (right to left)
…
But is this the only difference?
Since you were asking about the assignment operators: yes, that is the only difference. However, you would be forgiven for believing otherwise. Even the R documentation of ?assignOps claims that there are more differences:
The operator <- can be used anywhere,
whereas the operator = is only allowed at the top level (e.g.,
in the complete expression typed at the command prompt) or as one
of the subexpressions in a braced list of expressions.
Let’s not put too fine a point on it: the R documentation is wrong. This is easy to show: we just need to find a counter-example of the = operator that isn’t (a) at the top level, nor (b) a subexpression in a braced list of expressions (i.e. {…; …}). — Without further ado:
x
# Error: object 'x' not found
sum((x = 1), 2)
# [1] 3
x
# [1] 1
Clearly we’ve performed an assignment, using =, outside of contexts (a) and (b). So, why has the documentation of a core R language feature been wrong for decades?
It’s because in R’s syntax the symbol = has two distinct meanings that get routinely conflated (even by experts, including in the documentation cited above):
The first meaning is as an assignment operator. This is all we’ve talked about so far.
The second meaning isn’t an operator but rather a syntax token that signals named argument passing in a function call. Unlike the = operator it performs no action at runtime, it merely changes the way an expression is parsed.
So how does R decide whether a given usage of = refers to the operator or to named argument passing? Let’s see.
In any piece of code of the general form …
‹function_name›(‹argname› = ‹value›, …)
‹function_name›(‹args›, ‹argname› = ‹value›, …)
… the = is the token that defines named argument passing: it is not the assignment operator. Furthermore, = is entirely forbidden in some syntactic contexts:
if (‹var› = ‹value›) …
while (‹var› = ‹value›) …
for (‹var› = ‹value› in ‹value2›) …
for (‹var1› in ‹var2› = ‹value›) …
Any of these will raise an error “unexpected '=' in ‹bla›”.
In any other context, = refers to the assignment operator call. In particular, merely putting parentheses around the subexpression makes any of the above (a) valid, and (b) an assignment. For instance, the following performs assignment:
median((x = 1 : 10))
But also:
if (! (nf = length(from))) return()
Now you might object that such code is atrocious (and you may be right). But I took this code from the base::file.copy function (replacing <- with =) — it’s a pervasive pattern in much of the core R codebase.
The original explanation by John Chambers, which the the R documentation is probably based on, actually explains this correctly:
[= assignment is] allowed in only two places in the grammar: at the top level (as a complete program or user-typed expression); and when isolated from surrounding logical structure, by braces or an extra pair of parentheses.
In sum, by default the operators <- and = do the same thing. But either of them can be overridden separately to change its behaviour. By contrast, <- and -> (left-to-right assignment), though syntactically distinct, always call the same function. Overriding one also overrides the other. Knowing this is rarely practical but it can be used for some fun shenanigans.
Google's R style guide simplifies the issue by prohibiting the "=" for assignment. Not a bad choice.
https://google.github.io/styleguide/Rguide.xml
The R manual goes into nice detail on all 5 assignment operators.
http://stat.ethz.ch/R-manual/R-patched/library/base/html/assignOps.html
x = y = 5 is equivalent to x = (y = 5), because the assignment operators "group" right to left, which works. Meaning: assign 5 to y, leaving the number 5; and then assign that 5 to x.
This is not the same as (x = y) = 5, which doesn't work! Meaning: assign the value of y to x, leaving the value of y; and then assign 5 to, umm..., what exactly?
When you mix the different kinds of assignment operators, <- binds tighter than =. So x = y <- 5 is interpreted as x = (y <- 5), which is the case that makes sense.
Unfortunately, x <- y = 5 is interpreted as (x <- y) = 5, which is the case that doesn't work!
See ?Syntax and ?assignOps for the precedence (binding) and grouping rules.
According to John Chambers, the operator = is only allowed at "the top level," which means it is not allowed in control structures like if, making the following programming error illegal.
> if(x = 0) 1 else x
Error: syntax error
As he writes, "Disallowing the new assignment form [=] in control expressions avoids programming errors (such as the example above) that are more likely with the equal operator than with other S assignments."
You can manage to do this if it's "isolated from surrounding logical structure, by braces or an extra pair of parentheses," so if ((x = 0)) 1 else x would work.
See http://developer.r-project.org/equalAssign.html
From the official R documentation:
The operators <- and = assign into the environment in which they
are evaluated. The operator <- can be used anywhere, whereas the
operator = is only allowed at the top level (e.g., in the
complete expression typed at the command prompt) or as one of the
subexpressions in a braced list of expressions.
This may also add to understanding of the difference between those two operators:
df <- data.frame(
a = rnorm(10),
b <- rnorm(10)
)
For the first element R has assigned values and proper name, while the name of the second element looks a bit strange.
str(df)
# 'data.frame': 10 obs. of 2 variables:
# $ a : num 0.6393 1.125 -1.2514 0.0729 -1.3292 ...
# $ b....rnorm.10.: num 0.2485 0.0391 -1.6532 -0.3366 1.1951 ...
R version 3.3.2 (2016-10-31); macOS Sierra 10.12.1
I am not sure if Patrick Burns book R inferno has been cited here where in 8.2.26 = is not a synonym of <- Patrick states "You clearly do not want to use '<-' when you want to set an argument of a function.". The book is available at https://www.burns-stat.com/documents/books/the-r-inferno/
There are some differences between <- and = in the past version of R or even the predecessor language of R (S language). But currently, it seems using = only like any other modern language (python, java) won't cause any problem. You can achieve some more functionality by using <- when passing a value to some augments while also creating a global variable at the same time but it may have weird/unwanted behavior like in
df <- data.frame(
a = rnorm(10),
b <- rnorm(10)
)
str(df)
# 'data.frame': 10 obs. of 2 variables:
# $ a : num 0.6393 1.125 -1.2514 0.0729 -1.3292 ...
# $ b....rnorm.10.: num 0.2485 0.0391 -1.6532 -0.3366 1.1951 ...
Highly recommended! Try to read this article which is the best article that tries to explain the difference between those two:
Check https://colinfay.me/r-assignment/
Also, think about <- as a function that invisibly returns a value.
a <- 2
(a <- 2)
#> [1] 2
See: https://adv-r.hadley.nz/functions.html

R <- or = operators, why is <- necessary in this case? [duplicate]

What are the differences between the assignment operators = and <- in R?
I know that operators are slightly different, as this example shows
x <- y <- 5
x = y = 5
x = y <- 5
x <- y = 5
# Error in (x <- y) = 5 : could not find function "<-<-"
But is this the only difference?
The difference in assignment operators is clearer when you use them to set an argument value in a function call. For example:
median(x = 1:10)
x
## Error: object 'x' not found
In this case, x is declared within the scope of the function, so it does not exist in the user workspace.
median(x <- 1:10)
x
## [1] 1 2 3 4 5 6 7 8 9 10
In this case, x is declared in the user workspace, so you can use it after the function call has been completed.
There is a general preference among the R community for using <- for assignment (other than in function signatures) for compatibility with (very) old versions of S-Plus. Note that the spaces help to clarify situations like
x<-3
# Does this mean assignment?
x <- 3
# Or less than?
x < -3
Most R IDEs have keyboard shortcuts to make <- easier to type. Ctrl + = in Architect, Alt + - in RStudio (Option + - under macOS), Shift + - (underscore) in emacs+ESS.
If you prefer writing = to <- but want to use the more common assignment symbol for publicly released code (on CRAN, for example), then you can use one of the tidy_* functions in the formatR package to automatically replace = with <-.
library(formatR)
tidy_source(text = "x=1:5", arrow = TRUE)
## x <- 1:5
The answer to the question "Why does x <- y = 5 throw an error but not x <- y <- 5?" is "It's down to the magic contained in the parser". R's syntax contains many ambiguous cases that have to be resolved one way or another. The parser chooses to resolve the bits of the expression in different orders depending on whether = or <- was used.
To understand what is happening, you need to know that assignment silently returns the value that was assigned. You can see that more clearly by explicitly printing, for example print(x <- 2 + 3).
Secondly, it's clearer if we use prefix notation for assignment. So
x <- 5
`<-`(x, 5) #same thing
y = 5
`=`(y, 5) #also the same thing
The parser interprets x <- y <- 5 as
`<-`(x, `<-`(y, 5))
We might expect that x <- y = 5 would then be
`<-`(x, `=`(y, 5))
but actually it gets interpreted as
`=`(`<-`(x, y), 5)
This is because = is lower precedence than <-, as shown on the ?Syntax help page.
What are the differences between the assignment operators = and <- in R?
As your example shows, = and <- have slightly different operator precedence (which determines the order of evaluation when they are mixed in the same expression). In fact, ?Syntax in R gives the following operator precedence table, from highest to lowest:
…
‘-> ->>’ rightwards assignment
‘<- <<-’ assignment (right to left)
‘=’ assignment (right to left)
…
But is this the only difference?
Since you were asking about the assignment operators: yes, that is the only difference. However, you would be forgiven for believing otherwise. Even the R documentation of ?assignOps claims that there are more differences:
The operator <- can be used anywhere,
whereas the operator = is only allowed at the top level (e.g.,
in the complete expression typed at the command prompt) or as one
of the subexpressions in a braced list of expressions.
Let’s not put too fine a point on it: the R documentation is wrong. This is easy to show: we just need to find a counter-example of the = operator that isn’t (a) at the top level, nor (b) a subexpression in a braced list of expressions (i.e. {…; …}). — Without further ado:
x
# Error: object 'x' not found
sum((x = 1), 2)
# [1] 3
x
# [1] 1
Clearly we’ve performed an assignment, using =, outside of contexts (a) and (b). So, why has the documentation of a core R language feature been wrong for decades?
It’s because in R’s syntax the symbol = has two distinct meanings that get routinely conflated (even by experts, including in the documentation cited above):
The first meaning is as an assignment operator. This is all we’ve talked about so far.
The second meaning isn’t an operator but rather a syntax token that signals named argument passing in a function call. Unlike the = operator it performs no action at runtime, it merely changes the way an expression is parsed.
So how does R decide whether a given usage of = refers to the operator or to named argument passing? Let’s see.
In any piece of code of the general form …
‹function_name›(‹argname› = ‹value›, …)
‹function_name›(‹args›, ‹argname› = ‹value›, …)
… the = is the token that defines named argument passing: it is not the assignment operator. Furthermore, = is entirely forbidden in some syntactic contexts:
if (‹var› = ‹value›) …
while (‹var› = ‹value›) …
for (‹var› = ‹value› in ‹value2›) …
for (‹var1› in ‹var2› = ‹value›) …
Any of these will raise an error “unexpected '=' in ‹bla›”.
In any other context, = refers to the assignment operator call. In particular, merely putting parentheses around the subexpression makes any of the above (a) valid, and (b) an assignment. For instance, the following performs assignment:
median((x = 1 : 10))
But also:
if (! (nf = length(from))) return()
Now you might object that such code is atrocious (and you may be right). But I took this code from the base::file.copy function (replacing <- with =) — it’s a pervasive pattern in much of the core R codebase.
The original explanation by John Chambers, which the the R documentation is probably based on, actually explains this correctly:
[= assignment is] allowed in only two places in the grammar: at the top level (as a complete program or user-typed expression); and when isolated from surrounding logical structure, by braces or an extra pair of parentheses.
In sum, by default the operators <- and = do the same thing. But either of them can be overridden separately to change its behaviour. By contrast, <- and -> (left-to-right assignment), though syntactically distinct, always call the same function. Overriding one also overrides the other. Knowing this is rarely practical but it can be used for some fun shenanigans.
Google's R style guide simplifies the issue by prohibiting the "=" for assignment. Not a bad choice.
https://google.github.io/styleguide/Rguide.xml
The R manual goes into nice detail on all 5 assignment operators.
http://stat.ethz.ch/R-manual/R-patched/library/base/html/assignOps.html
x = y = 5 is equivalent to x = (y = 5), because the assignment operators "group" right to left, which works. Meaning: assign 5 to y, leaving the number 5; and then assign that 5 to x.
This is not the same as (x = y) = 5, which doesn't work! Meaning: assign the value of y to x, leaving the value of y; and then assign 5 to, umm..., what exactly?
When you mix the different kinds of assignment operators, <- binds tighter than =. So x = y <- 5 is interpreted as x = (y <- 5), which is the case that makes sense.
Unfortunately, x <- y = 5 is interpreted as (x <- y) = 5, which is the case that doesn't work!
See ?Syntax and ?assignOps for the precedence (binding) and grouping rules.
According to John Chambers, the operator = is only allowed at "the top level," which means it is not allowed in control structures like if, making the following programming error illegal.
> if(x = 0) 1 else x
Error: syntax error
As he writes, "Disallowing the new assignment form [=] in control expressions avoids programming errors (such as the example above) that are more likely with the equal operator than with other S assignments."
You can manage to do this if it's "isolated from surrounding logical structure, by braces or an extra pair of parentheses," so if ((x = 0)) 1 else x would work.
See http://developer.r-project.org/equalAssign.html
From the official R documentation:
The operators <- and = assign into the environment in which they
are evaluated. The operator <- can be used anywhere, whereas the
operator = is only allowed at the top level (e.g., in the
complete expression typed at the command prompt) or as one of the
subexpressions in a braced list of expressions.
This may also add to understanding of the difference between those two operators:
df <- data.frame(
a = rnorm(10),
b <- rnorm(10)
)
For the first element R has assigned values and proper name, while the name of the second element looks a bit strange.
str(df)
# 'data.frame': 10 obs. of 2 variables:
# $ a : num 0.6393 1.125 -1.2514 0.0729 -1.3292 ...
# $ b....rnorm.10.: num 0.2485 0.0391 -1.6532 -0.3366 1.1951 ...
R version 3.3.2 (2016-10-31); macOS Sierra 10.12.1
I am not sure if Patrick Burns book R inferno has been cited here where in 8.2.26 = is not a synonym of <- Patrick states "You clearly do not want to use '<-' when you want to set an argument of a function.". The book is available at https://www.burns-stat.com/documents/books/the-r-inferno/
There are some differences between <- and = in the past version of R or even the predecessor language of R (S language). But currently, it seems using = only like any other modern language (python, java) won't cause any problem. You can achieve some more functionality by using <- when passing a value to some augments while also creating a global variable at the same time but it may have weird/unwanted behavior like in
df <- data.frame(
a = rnorm(10),
b <- rnorm(10)
)
str(df)
# 'data.frame': 10 obs. of 2 variables:
# $ a : num 0.6393 1.125 -1.2514 0.0729 -1.3292 ...
# $ b....rnorm.10.: num 0.2485 0.0391 -1.6532 -0.3366 1.1951 ...
Highly recommended! Try to read this article which is the best article that tries to explain the difference between those two:
Check https://colinfay.me/r-assignment/
Also, think about <- as a function that invisibly returns a value.
a <- 2
(a <- 2)
#> [1] 2
See: https://adv-r.hadley.nz/functions.html

What are the differences between "=" and "<-" assignment operators?

What are the differences between the assignment operators = and <- in R?
I know that operators are slightly different, as this example shows
x <- y <- 5
x = y = 5
x = y <- 5
x <- y = 5
# Error in (x <- y) = 5 : could not find function "<-<-"
But is this the only difference?
The difference in assignment operators is clearer when you use them to set an argument value in a function call. For example:
median(x = 1:10)
x
## Error: object 'x' not found
In this case, x is declared within the scope of the function, so it does not exist in the user workspace.
median(x <- 1:10)
x
## [1] 1 2 3 4 5 6 7 8 9 10
In this case, x is declared in the user workspace, so you can use it after the function call has been completed.
There is a general preference among the R community for using <- for assignment (other than in function signatures) for compatibility with (very) old versions of S-Plus. Note that the spaces help to clarify situations like
x<-3
# Does this mean assignment?
x <- 3
# Or less than?
x < -3
Most R IDEs have keyboard shortcuts to make <- easier to type. Ctrl + = in Architect, Alt + - in RStudio (Option + - under macOS), Shift + - (underscore) in emacs+ESS.
If you prefer writing = to <- but want to use the more common assignment symbol for publicly released code (on CRAN, for example), then you can use one of the tidy_* functions in the formatR package to automatically replace = with <-.
library(formatR)
tidy_source(text = "x=1:5", arrow = TRUE)
## x <- 1:5
The answer to the question "Why does x <- y = 5 throw an error but not x <- y <- 5?" is "It's down to the magic contained in the parser". R's syntax contains many ambiguous cases that have to be resolved one way or another. The parser chooses to resolve the bits of the expression in different orders depending on whether = or <- was used.
To understand what is happening, you need to know that assignment silently returns the value that was assigned. You can see that more clearly by explicitly printing, for example print(x <- 2 + 3).
Secondly, it's clearer if we use prefix notation for assignment. So
x <- 5
`<-`(x, 5) #same thing
y = 5
`=`(y, 5) #also the same thing
The parser interprets x <- y <- 5 as
`<-`(x, `<-`(y, 5))
We might expect that x <- y = 5 would then be
`<-`(x, `=`(y, 5))
but actually it gets interpreted as
`=`(`<-`(x, y), 5)
This is because = is lower precedence than <-, as shown on the ?Syntax help page.
What are the differences between the assignment operators = and <- in R?
As your example shows, = and <- have slightly different operator precedence (which determines the order of evaluation when they are mixed in the same expression). In fact, ?Syntax in R gives the following operator precedence table, from highest to lowest:
…
‘-> ->>’ rightwards assignment
‘<- <<-’ assignment (right to left)
‘=’ assignment (right to left)
…
But is this the only difference?
Since you were asking about the assignment operators: yes, that is the only difference. However, you would be forgiven for believing otherwise. Even the R documentation of ?assignOps claims that there are more differences:
The operator <- can be used anywhere,
whereas the operator = is only allowed at the top level (e.g.,
in the complete expression typed at the command prompt) or as one
of the subexpressions in a braced list of expressions.
Let’s not put too fine a point on it: the R documentation is wrong. This is easy to show: we just need to find a counter-example of the = operator that isn’t (a) at the top level, nor (b) a subexpression in a braced list of expressions (i.e. {…; …}). — Without further ado:
x
# Error: object 'x' not found
sum((x = 1), 2)
# [1] 3
x
# [1] 1
Clearly we’ve performed an assignment, using =, outside of contexts (a) and (b). So, why has the documentation of a core R language feature been wrong for decades?
It’s because in R’s syntax the symbol = has two distinct meanings that get routinely conflated (even by experts, including in the documentation cited above):
The first meaning is as an assignment operator. This is all we’ve talked about so far.
The second meaning isn’t an operator but rather a syntax token that signals named argument passing in a function call. Unlike the = operator it performs no action at runtime, it merely changes the way an expression is parsed.
So how does R decide whether a given usage of = refers to the operator or to named argument passing? Let’s see.
In any piece of code of the general form …
‹function_name›(‹argname› = ‹value›, …)
‹function_name›(‹args›, ‹argname› = ‹value›, …)
… the = is the token that defines named argument passing: it is not the assignment operator. Furthermore, = is entirely forbidden in some syntactic contexts:
if (‹var› = ‹value›) …
while (‹var› = ‹value›) …
for (‹var› = ‹value› in ‹value2›) …
for (‹var1› in ‹var2› = ‹value›) …
Any of these will raise an error “unexpected '=' in ‹bla›”.
In any other context, = refers to the assignment operator call. In particular, merely putting parentheses around the subexpression makes any of the above (a) valid, and (b) an assignment. For instance, the following performs assignment:
median((x = 1 : 10))
But also:
if (! (nf = length(from))) return()
Now you might object that such code is atrocious (and you may be right). But I took this code from the base::file.copy function (replacing <- with =) — it’s a pervasive pattern in much of the core R codebase.
The original explanation by John Chambers, which the the R documentation is probably based on, actually explains this correctly:
[= assignment is] allowed in only two places in the grammar: at the top level (as a complete program or user-typed expression); and when isolated from surrounding logical structure, by braces or an extra pair of parentheses.
In sum, by default the operators <- and = do the same thing. But either of them can be overridden separately to change its behaviour. By contrast, <- and -> (left-to-right assignment), though syntactically distinct, always call the same function. Overriding one also overrides the other. Knowing this is rarely practical but it can be used for some fun shenanigans.
Google's R style guide simplifies the issue by prohibiting the "=" for assignment. Not a bad choice.
https://google.github.io/styleguide/Rguide.xml
The R manual goes into nice detail on all 5 assignment operators.
http://stat.ethz.ch/R-manual/R-patched/library/base/html/assignOps.html
x = y = 5 is equivalent to x = (y = 5), because the assignment operators "group" right to left, which works. Meaning: assign 5 to y, leaving the number 5; and then assign that 5 to x.
This is not the same as (x = y) = 5, which doesn't work! Meaning: assign the value of y to x, leaving the value of y; and then assign 5 to, umm..., what exactly?
When you mix the different kinds of assignment operators, <- binds tighter than =. So x = y <- 5 is interpreted as x = (y <- 5), which is the case that makes sense.
Unfortunately, x <- y = 5 is interpreted as (x <- y) = 5, which is the case that doesn't work!
See ?Syntax and ?assignOps for the precedence (binding) and grouping rules.
According to John Chambers, the operator = is only allowed at "the top level," which means it is not allowed in control structures like if, making the following programming error illegal.
> if(x = 0) 1 else x
Error: syntax error
As he writes, "Disallowing the new assignment form [=] in control expressions avoids programming errors (such as the example above) that are more likely with the equal operator than with other S assignments."
You can manage to do this if it's "isolated from surrounding logical structure, by braces or an extra pair of parentheses," so if ((x = 0)) 1 else x would work.
See http://developer.r-project.org/equalAssign.html
From the official R documentation:
The operators <- and = assign into the environment in which they
are evaluated. The operator <- can be used anywhere, whereas the
operator = is only allowed at the top level (e.g., in the
complete expression typed at the command prompt) or as one of the
subexpressions in a braced list of expressions.
This may also add to understanding of the difference between those two operators:
df <- data.frame(
a = rnorm(10),
b <- rnorm(10)
)
For the first element R has assigned values and proper name, while the name of the second element looks a bit strange.
str(df)
# 'data.frame': 10 obs. of 2 variables:
# $ a : num 0.6393 1.125 -1.2514 0.0729 -1.3292 ...
# $ b....rnorm.10.: num 0.2485 0.0391 -1.6532 -0.3366 1.1951 ...
R version 3.3.2 (2016-10-31); macOS Sierra 10.12.1
I am not sure if Patrick Burns book R inferno has been cited here where in 8.2.26 = is not a synonym of <- Patrick states "You clearly do not want to use '<-' when you want to set an argument of a function.". The book is available at https://www.burns-stat.com/documents/books/the-r-inferno/
There are some differences between <- and = in the past version of R or even the predecessor language of R (S language). But currently, it seems using = only like any other modern language (python, java) won't cause any problem. You can achieve some more functionality by using <- when passing a value to some augments while also creating a global variable at the same time but it may have weird/unwanted behavior like in
df <- data.frame(
a = rnorm(10),
b <- rnorm(10)
)
str(df)
# 'data.frame': 10 obs. of 2 variables:
# $ a : num 0.6393 1.125 -1.2514 0.0729 -1.3292 ...
# $ b....rnorm.10.: num 0.2485 0.0391 -1.6532 -0.3366 1.1951 ...
Highly recommended! Try to read this article which is the best article that tries to explain the difference between those two:
Check https://colinfay.me/r-assignment/
Also, think about <- as a function that invisibly returns a value.
a <- 2
(a <- 2)
#> [1] 2
See: https://adv-r.hadley.nz/functions.html

Resources