The arcane formals(function(x){})$x - r

What is the object formals(function(x){})$x?
It's found in the formals of a function, bound to arguments without default value.
Is there any other way to refer to this strange object? Does it have some role other than representing an empty function argument?
Here are some of its properties that can be checked in the console:
> is(formals(function(x){})$x)
[1] "name" "language" "refObject"
> formals(function(x){})$x
> as.character(formals(function(x){})$x)
[1] ""
EDIT: Here are some other ways to get this object:
alist(,)[[1]]
bquote()
quote(expr=)

Background: What is formals(function(x) {})?
Well, to start with (and as documented in ?formals) , formals(function(x) {}) returns a pairlist:
is(formals(function(x){}))
# [1] "pairlist"
Unlike list objects, pairlist objects can have named elements that contain no value -- a very nice thing when constructing a function that has a possibly optional formal argument. From ?pairlist:
tagged arguments with no value are allowed whereas ‘list’ simply ignores them.
To see the difference, compare alist(), which creates pairlists, with list() which constructs 'plain old' lists:
list(x=, y=2)
# Error in list(x = , y = 2) : argument 1 is empty
alist(x=, y=2)
# $x
#
# $y
# [1] 2
Your question: What is formals(function(x) {})$x?
Now to your question about what formals(function(x) {})$x is. My understanding is in some sense its real value is the "empty symbol". You can't, however, get at it from within R because the "empty symbol" is an object that R's developers -- very much by design -- try to entirely hide from R users. (For an interesting discussion of the empty symbol, and why it's kept hidden, see the thread starting here).
When one tries to get at it by indexing an empty-valued element of a pairlist, R's developers foil the attempt by having R return the name of the element instead of its verbotten-for-public-viewing value. (This is, of course, the name object shown in your question).

It's a name or symbol, see ?name, e.g.:
is(as.name('a'))
#[1] "name" "language" "refObject"
The only difference from your example is that you can't use as.name to create an empty one.

Related

Strange as.Date() behavior

I'm using R 4.2.1 with all packages updated to the latest version.
The two lines below differ only in the order of the elements in a concatenated vector, yet the output is completely different.
as.Date(c(Sys.Date(), "2020-09-09"))
as.Date(c("2020-09-09", Sys.Date()))
The output is:
> as.Date(c(Sys.Date(), "2020-09-09"))
[1] "2022-09-16" "2020-09-09"
> as.Date(c("2020-09-09", Sys.Date()))
[1] "2020-09-09" NA
The first line correctly coerces the system date as a string, and the second line coerces it first as a numeric value and then as a string, but I have never before run into a situation where coercion in R depends on the order of elements in a vector...
Can someone explain to me why coercion rules behave this way and where I can read more about it...
And what can I do in a situation when the type of elements inside c() is not known a priori?
Thank you!
The default c() unclasses each argument before combining them (unclass(Sys.Date()) is 19251 [as of today]); this is because "all attributes except names are removed" by (at least the default version of) c(), which includes the class.
The reason for the difference in orders is that c() is an S3 generic function, which means that it dispatches on the class of its first argument, so c(<date>, <character>) calls c.Date(), while c(<character>, <date>) calls the generic version of c() (which falls through to a primitive function in C which I don't want to bother digging through).
The code of c.Date:
function (..., recursive = FALSE)
.Date(c(unlist(lapply(list(...), function(e) unclass(as.Date(e))))))
in other words, it coerces everything to a date, then unclasses it, then turns the vector back to dates once everything is concatenated ...
A possible workaround/solution is to call c.Date() explicitly, if you know that's what you want ...

R unexpectedly auto-completing data.frame column names [duplicate]

I know that for a list, partial matching is done when indexing using the basic operators $ and [[. For example:
ll <- list(yy=1)
ll$y
[1] 1
But I am still an R newbie and this is new for me, partial matching of function arguments:
h <- function(xx=2)xx
h(x=2)
[1] 2
I want to understand how this works. What is the mechanism behind it? Does this have any side effects? I want understand how can someone test if the xx argument was given?
Edit after Andrie comment:
Internally R uses pmatch algorithm to match argument, here an example how this works:
pmatch("me", c("mean", "median", "mode")) # error multiple partial matches
[1] NA
> pmatch("mo", c("mean", "median", "mode")) # mo match mode match here
[1] 3
But why R has such feature? What is the basic idea behind of partial unique matching?
Partial matching exists to save you typing long argument names. The danger with it is that functions may gain additional arguments later on which conflict with your partial match. This means that it is only suitable for interactive use – if you are writing code that will stick around for a long time (to go in a package, for example) then you should always write the full argument name. The other problem is that by abbreviating an argument name, you can make your code less readable.
Two common good uses are:
len instead of length.out with the seq (or seq.int) function.
all instead of all.names with the ls function.
Compare:
seq.int(0, 1, len = 11)
seq.int(0, 1, length.out = 11)
ls(all = TRUE)
ls(all.names = TRUE)
In both of these cases, the code is just about as easy to read with the shortened argument names, and the functions are old and stable enough that another argument with a conflicting name is unlikely to be added.
A better solution for saving on typing is, rather than using abbreviated names, to use auto-completion of variable and argument names. R GUI and RStudio support this using the TAB key, and Architect supports this using CTRL+Space.
Some relevant sections of R Language Definition:
3.4.1 Indexing by vectors
...assume that the expression is x[i]. Then the following possibilities exist according to the type of i
Character. The strings in i are matched against the names attribute of x and the resulting integers are used. For [[ and $ partial matching is used if exact matching fails, so x$aa will match x$aabb if x does not contain a component named "aa" and "aabb" is the only name which has prefix "aa". For [[, partial matching can be controlled via the exact argument which defaults to NA indicating that partial matching is allowed, but should result in a warning when it occurs. Setting exact to TRUE prevents partial matching from occurring, a FALSE value allows it and does not issue any warnings. Note that [ always requires an exact match. The string "" is treated specially: it indicates ‘no name’ and matches no element (not even those without a name). Note that partial matching is only used when extracting and not when replacing.
[see also ?Extract]
4.3.2 Argument matching
The first thing that occurs in a function evaluation is the matching of formal to the actual or supplied arguments. This is done by a three-pass process:
Exact matching on tags. For each named supplied argument the list of formal arguments is searched for an item whose name matches exactly. It is an error to have the same formal argument match several actuals or vice versa.
Partial matching on tags. Each remaining named supplied argument is compared to the remaining formal arguments using partial matching. If the name of the supplied argument matches exactly with the first part of a formal argument then the two arguments are considered to be matched. It is an error to have multiple partial matches. Notice that if f <- function(fumble, fooey) fbody, then f(f = 1, fo = 2) is illegal, even though the 2nd actual argument only matches fooey. f(f = 1, fooey = 2) is legal though since the second argument matches exactly and is removed from consideration for partial matching. If the formal arguments contain ... then partial matching is only applied to arguments that precede it.
Positional matching.
Note that when subsetting a tibble
Partial matching of column names with $ and [[ is not supported, and NULL is returned. For $, a warning is given.

How to edit multi-person objects in R

I find the following behaviour of the R person object rather unexpected:
Let's create a multi-person object:
a = c(person("Huck", "Finn"), person("Tom", "Sawyer"))
Imagine we want to update the given name of one person in the object:
a[[1]]$given <- 'Huckleberry'
Then if we inspect our object, to my surprise we have:
> a
[1] " <> [] ()" "Tom Sawyer"
Where'd Huckleberry Finn go?! (Note that if we try this with just a single person object, it works fine.) Why does this happen?
How can we do the above so that we get the more logical behavior of correcting just the first name?
The syntax you want here is
a <- c(person("Huck", "Finn"), person("Tom", "Sawyer"))
a[1]$given<-"Huckleberry"
a
#[1] "Huckleberry Finn" "Tom Sawyer"
A group of people is still a "person" and it has it's own special indexing function [.person and concat function c.person so it has perhaps different behavior than you were expecting. The problem was that [[ ]] was messing with the underlying hidden list.
Actually, it's interesting because they've overloaded nearly all the indexing methods for person but not the [<- or [[<- and that's really what's causing the error. Because up to here, we're the same
`$<-`(`[`(a,1), "given", "Huckleberry") #works
`$<-`(`[[`(a,1), "given", "Huckleberry") #works
but when we get to
`[<-`(a, 1, `$<-`(`[`(a,1), "given", "Huckleberry")) #works
`[[<-`(a, 1, `$<-`(`[[`(a,1), "given", "Huckleberry")) #no work
we see a difference. The special wrapping/unwrapping that happens during retrieval does not happen during assignment.
So what's going on is that a "person" is always a list of lists. The outer list holds all the people and the inner lists hold the data. You can think of the data like this
x<-list(
list(name="a"),list(name="b")
)
y<-list(
list(name="c")
)
where x is a collection of two people and y is a "single" person. When you do
x[1]<-y
x
you end up with
list(
list(name="c"),list(name="b")
)
since you're replacing a list with a list which is how [ indexing works with lists. But if you try to replace the element at [[1]] with a list of lists, that list will get nested. For example
x[[1]]<-y
x
becomes
x<-list(
list(list(name="c")),list(name="b")
)
And that extra level of nesting is what's confusing R when it goes to print the person in the first position. That first person won't have any named elements at the second level, so when it goes to print, it will return
emptyp <- structure(list(structure(list(), class="person")), class="person")
utils:::format.person(emptyp)
# " <> [] ()"
which gives is the symbols where it's trying to place the name, e-mail address, role, and comment.

Why $ guesses the column names when used for data frame? [duplicate]

I know that for a list, partial matching is done when indexing using the basic operators $ and [[. For example:
ll <- list(yy=1)
ll$y
[1] 1
But I am still an R newbie and this is new for me, partial matching of function arguments:
h <- function(xx=2)xx
h(x=2)
[1] 2
I want to understand how this works. What is the mechanism behind it? Does this have any side effects? I want understand how can someone test if the xx argument was given?
Edit after Andrie comment:
Internally R uses pmatch algorithm to match argument, here an example how this works:
pmatch("me", c("mean", "median", "mode")) # error multiple partial matches
[1] NA
> pmatch("mo", c("mean", "median", "mode")) # mo match mode match here
[1] 3
But why R has such feature? What is the basic idea behind of partial unique matching?
Partial matching exists to save you typing long argument names. The danger with it is that functions may gain additional arguments later on which conflict with your partial match. This means that it is only suitable for interactive use – if you are writing code that will stick around for a long time (to go in a package, for example) then you should always write the full argument name. The other problem is that by abbreviating an argument name, you can make your code less readable.
Two common good uses are:
len instead of length.out with the seq (or seq.int) function.
all instead of all.names with the ls function.
Compare:
seq.int(0, 1, len = 11)
seq.int(0, 1, length.out = 11)
ls(all = TRUE)
ls(all.names = TRUE)
In both of these cases, the code is just about as easy to read with the shortened argument names, and the functions are old and stable enough that another argument with a conflicting name is unlikely to be added.
A better solution for saving on typing is, rather than using abbreviated names, to use auto-completion of variable and argument names. R GUI and RStudio support this using the TAB key, and Architect supports this using CTRL+Space.
Some relevant sections of R Language Definition:
3.4.1 Indexing by vectors
...assume that the expression is x[i]. Then the following possibilities exist according to the type of i
Character. The strings in i are matched against the names attribute of x and the resulting integers are used. For [[ and $ partial matching is used if exact matching fails, so x$aa will match x$aabb if x does not contain a component named "aa" and "aabb" is the only name which has prefix "aa". For [[, partial matching can be controlled via the exact argument which defaults to NA indicating that partial matching is allowed, but should result in a warning when it occurs. Setting exact to TRUE prevents partial matching from occurring, a FALSE value allows it and does not issue any warnings. Note that [ always requires an exact match. The string "" is treated specially: it indicates ‘no name’ and matches no element (not even those without a name). Note that partial matching is only used when extracting and not when replacing.
[see also ?Extract]
4.3.2 Argument matching
The first thing that occurs in a function evaluation is the matching of formal to the actual or supplied arguments. This is done by a three-pass process:
Exact matching on tags. For each named supplied argument the list of formal arguments is searched for an item whose name matches exactly. It is an error to have the same formal argument match several actuals or vice versa.
Partial matching on tags. Each remaining named supplied argument is compared to the remaining formal arguments using partial matching. If the name of the supplied argument matches exactly with the first part of a formal argument then the two arguments are considered to be matched. It is an error to have multiple partial matches. Notice that if f <- function(fumble, fooey) fbody, then f(f = 1, fo = 2) is illegal, even though the 2nd actual argument only matches fooey. f(f = 1, fooey = 2) is legal though since the second argument matches exactly and is removed from consideration for partial matching. If the formal arguments contain ... then partial matching is only applied to arguments that precede it.
Positional matching.
Note that when subsetting a tibble
Partial matching of column names with $ and [[ is not supported, and NULL is returned. For $, a warning is given.

Why does R use partial matching?

I know that for a list, partial matching is done when indexing using the basic operators $ and [[. For example:
ll <- list(yy=1)
ll$y
[1] 1
But I am still an R newbie and this is new for me, partial matching of function arguments:
h <- function(xx=2)xx
h(x=2)
[1] 2
I want to understand how this works. What is the mechanism behind it? Does this have any side effects? I want understand how can someone test if the xx argument was given?
Edit after Andrie comment:
Internally R uses pmatch algorithm to match argument, here an example how this works:
pmatch("me", c("mean", "median", "mode")) # error multiple partial matches
[1] NA
> pmatch("mo", c("mean", "median", "mode")) # mo match mode match here
[1] 3
But why R has such feature? What is the basic idea behind of partial unique matching?
Partial matching exists to save you typing long argument names. The danger with it is that functions may gain additional arguments later on which conflict with your partial match. This means that it is only suitable for interactive use – if you are writing code that will stick around for a long time (to go in a package, for example) then you should always write the full argument name. The other problem is that by abbreviating an argument name, you can make your code less readable.
Two common good uses are:
len instead of length.out with the seq (or seq.int) function.
all instead of all.names with the ls function.
Compare:
seq.int(0, 1, len = 11)
seq.int(0, 1, length.out = 11)
ls(all = TRUE)
ls(all.names = TRUE)
In both of these cases, the code is just about as easy to read with the shortened argument names, and the functions are old and stable enough that another argument with a conflicting name is unlikely to be added.
A better solution for saving on typing is, rather than using abbreviated names, to use auto-completion of variable and argument names. R GUI and RStudio support this using the TAB key, and Architect supports this using CTRL+Space.
Some relevant sections of R Language Definition:
3.4.1 Indexing by vectors
...assume that the expression is x[i]. Then the following possibilities exist according to the type of i
Character. The strings in i are matched against the names attribute of x and the resulting integers are used. For [[ and $ partial matching is used if exact matching fails, so x$aa will match x$aabb if x does not contain a component named "aa" and "aabb" is the only name which has prefix "aa". For [[, partial matching can be controlled via the exact argument which defaults to NA indicating that partial matching is allowed, but should result in a warning when it occurs. Setting exact to TRUE prevents partial matching from occurring, a FALSE value allows it and does not issue any warnings. Note that [ always requires an exact match. The string "" is treated specially: it indicates ‘no name’ and matches no element (not even those without a name). Note that partial matching is only used when extracting and not when replacing.
[see also ?Extract]
4.3.2 Argument matching
The first thing that occurs in a function evaluation is the matching of formal to the actual or supplied arguments. This is done by a three-pass process:
Exact matching on tags. For each named supplied argument the list of formal arguments is searched for an item whose name matches exactly. It is an error to have the same formal argument match several actuals or vice versa.
Partial matching on tags. Each remaining named supplied argument is compared to the remaining formal arguments using partial matching. If the name of the supplied argument matches exactly with the first part of a formal argument then the two arguments are considered to be matched. It is an error to have multiple partial matches. Notice that if f <- function(fumble, fooey) fbody, then f(f = 1, fo = 2) is illegal, even though the 2nd actual argument only matches fooey. f(f = 1, fooey = 2) is legal though since the second argument matches exactly and is removed from consideration for partial matching. If the formal arguments contain ... then partial matching is only applied to arguments that precede it.
Positional matching.
Note that when subsetting a tibble
Partial matching of column names with $ and [[ is not supported, and NULL is returned. For $, a warning is given.

Resources