Why $ guesses the column names when used for data frame? [duplicate] - r

I know that for a list, partial matching is done when indexing using the basic operators $ and [[. For example:
ll <- list(yy=1)
ll$y
[1] 1
But I am still an R newbie and this is new for me, partial matching of function arguments:
h <- function(xx=2)xx
h(x=2)
[1] 2
I want to understand how this works. What is the mechanism behind it? Does this have any side effects? I want understand how can someone test if the xx argument was given?
Edit after Andrie comment:
Internally R uses pmatch algorithm to match argument, here an example how this works:
pmatch("me", c("mean", "median", "mode")) # error multiple partial matches
[1] NA
> pmatch("mo", c("mean", "median", "mode")) # mo match mode match here
[1] 3
But why R has such feature? What is the basic idea behind of partial unique matching?

Partial matching exists to save you typing long argument names. The danger with it is that functions may gain additional arguments later on which conflict with your partial match. This means that it is only suitable for interactive use – if you are writing code that will stick around for a long time (to go in a package, for example) then you should always write the full argument name. The other problem is that by abbreviating an argument name, you can make your code less readable.
Two common good uses are:
len instead of length.out with the seq (or seq.int) function.
all instead of all.names with the ls function.
Compare:
seq.int(0, 1, len = 11)
seq.int(0, 1, length.out = 11)
ls(all = TRUE)
ls(all.names = TRUE)
In both of these cases, the code is just about as easy to read with the shortened argument names, and the functions are old and stable enough that another argument with a conflicting name is unlikely to be added.
A better solution for saving on typing is, rather than using abbreviated names, to use auto-completion of variable and argument names. R GUI and RStudio support this using the TAB key, and Architect supports this using CTRL+Space.
Some relevant sections of R Language Definition:
3.4.1 Indexing by vectors
...assume that the expression is x[i]. Then the following possibilities exist according to the type of i
Character. The strings in i are matched against the names attribute of x and the resulting integers are used. For [[ and $ partial matching is used if exact matching fails, so x$aa will match x$aabb if x does not contain a component named "aa" and "aabb" is the only name which has prefix "aa". For [[, partial matching can be controlled via the exact argument which defaults to NA indicating that partial matching is allowed, but should result in a warning when it occurs. Setting exact to TRUE prevents partial matching from occurring, a FALSE value allows it and does not issue any warnings. Note that [ always requires an exact match. The string "" is treated specially: it indicates ‘no name’ and matches no element (not even those without a name). Note that partial matching is only used when extracting and not when replacing.
[see also ?Extract]
4.3.2 Argument matching
The first thing that occurs in a function evaluation is the matching of formal to the actual or supplied arguments. This is done by a three-pass process:
Exact matching on tags. For each named supplied argument the list of formal arguments is searched for an item whose name matches exactly. It is an error to have the same formal argument match several actuals or vice versa.
Partial matching on tags. Each remaining named supplied argument is compared to the remaining formal arguments using partial matching. If the name of the supplied argument matches exactly with the first part of a formal argument then the two arguments are considered to be matched. It is an error to have multiple partial matches. Notice that if f <- function(fumble, fooey) fbody, then f(f = 1, fo = 2) is illegal, even though the 2nd actual argument only matches fooey. f(f = 1, fooey = 2) is legal though since the second argument matches exactly and is removed from consideration for partial matching. If the formal arguments contain ... then partial matching is only applied to arguments that precede it.
Positional matching.
Note that when subsetting a tibble
Partial matching of column names with $ and [[ is not supported, and NULL is returned. For $, a warning is given.

Related

R unexpectedly auto-completing data.frame column names [duplicate]

I know that for a list, partial matching is done when indexing using the basic operators $ and [[. For example:
ll <- list(yy=1)
ll$y
[1] 1
But I am still an R newbie and this is new for me, partial matching of function arguments:
h <- function(xx=2)xx
h(x=2)
[1] 2
I want to understand how this works. What is the mechanism behind it? Does this have any side effects? I want understand how can someone test if the xx argument was given?
Edit after Andrie comment:
Internally R uses pmatch algorithm to match argument, here an example how this works:
pmatch("me", c("mean", "median", "mode")) # error multiple partial matches
[1] NA
> pmatch("mo", c("mean", "median", "mode")) # mo match mode match here
[1] 3
But why R has such feature? What is the basic idea behind of partial unique matching?
Partial matching exists to save you typing long argument names. The danger with it is that functions may gain additional arguments later on which conflict with your partial match. This means that it is only suitable for interactive use – if you are writing code that will stick around for a long time (to go in a package, for example) then you should always write the full argument name. The other problem is that by abbreviating an argument name, you can make your code less readable.
Two common good uses are:
len instead of length.out with the seq (or seq.int) function.
all instead of all.names with the ls function.
Compare:
seq.int(0, 1, len = 11)
seq.int(0, 1, length.out = 11)
ls(all = TRUE)
ls(all.names = TRUE)
In both of these cases, the code is just about as easy to read with the shortened argument names, and the functions are old and stable enough that another argument with a conflicting name is unlikely to be added.
A better solution for saving on typing is, rather than using abbreviated names, to use auto-completion of variable and argument names. R GUI and RStudio support this using the TAB key, and Architect supports this using CTRL+Space.
Some relevant sections of R Language Definition:
3.4.1 Indexing by vectors
...assume that the expression is x[i]. Then the following possibilities exist according to the type of i
Character. The strings in i are matched against the names attribute of x and the resulting integers are used. For [[ and $ partial matching is used if exact matching fails, so x$aa will match x$aabb if x does not contain a component named "aa" and "aabb" is the only name which has prefix "aa". For [[, partial matching can be controlled via the exact argument which defaults to NA indicating that partial matching is allowed, but should result in a warning when it occurs. Setting exact to TRUE prevents partial matching from occurring, a FALSE value allows it and does not issue any warnings. Note that [ always requires an exact match. The string "" is treated specially: it indicates ‘no name’ and matches no element (not even those without a name). Note that partial matching is only used when extracting and not when replacing.
[see also ?Extract]
4.3.2 Argument matching
The first thing that occurs in a function evaluation is the matching of formal to the actual or supplied arguments. This is done by a three-pass process:
Exact matching on tags. For each named supplied argument the list of formal arguments is searched for an item whose name matches exactly. It is an error to have the same formal argument match several actuals or vice versa.
Partial matching on tags. Each remaining named supplied argument is compared to the remaining formal arguments using partial matching. If the name of the supplied argument matches exactly with the first part of a formal argument then the two arguments are considered to be matched. It is an error to have multiple partial matches. Notice that if f <- function(fumble, fooey) fbody, then f(f = 1, fo = 2) is illegal, even though the 2nd actual argument only matches fooey. f(f = 1, fooey = 2) is legal though since the second argument matches exactly and is removed from consideration for partial matching. If the formal arguments contain ... then partial matching is only applied to arguments that precede it.
Positional matching.
Note that when subsetting a tibble
Partial matching of column names with $ and [[ is not supported, and NULL is returned. For $, a warning is given.

Why R is returning object when names are partially matching? [duplicate]

This question already has answers here:
Weird case with data tables in R, column names are mixed
(3 answers)
Closed 1 year ago.
Let say I have below list
Dat = list('AAA' = 1:4, 'BBB' = 5:9)
Now I have below syntax
Dat$AA
## [1] 1 2 3 4
However my question in why R is retuning value from Dat$AA given that there is no element with such name? Is R returning partial names?
If this is the case, then I think such behaviour is utterly risky and should not be allowed.
According to ?Extract
Both [[ and $ select a single element of the list. The main difference is that $ does not allow computed indices, whereas [[ does. x$name is equivalent to x[["name", exact = FALSE]]. Also, the partial matching behavior of [[ can be controlled using the exact argument.
Also, it is written under name
name - A literal character string or a name (possibly backtick quoted). For extraction, this is normally (see under ‘Environments’) partially matched to the names of the object.
where the usage is
x$name
As the exact is FALSE, it allows for partial matching when we use $. It is one of the reasons to use [[ where the exact = TRUE as the usage is
x[[i, exact = TRUE]]
It is also mentioned in ?Extract how to change the options
Thus the default behaviour is to use partial matching only when extracting from recursive objects (except environments) by $. Even in that case, warnings can be switched on by options(warnPartialMatchDollar = TRUE).
akrun already explained that this behaviour is documented, but generally we prefer that this doesn't happen. Consequently, a set of lines that I always put into my .Rprofile is the following:
options(
warnPartialMatchArgs = TRUE,
warnPartialMatchDollar = TRUE,
warnPartialMatchAttr = TRUE
)
This will be run when R starts, and generate warnings whenever partial matching occurs in R. There are three contexts that it happens as shown: in function arguments, when using $, and in using attr(). I don't think you can turn the partial matching off entirely, but this should highlight whenever it happens and prevent ensuing bugs.

R seems to ignore part of variable name after underscore

I encountered a strange problem with R. I have a dataframe with several variables. I add a variable to this dataframe that contains an underscore, for example:
allres$tmp_weighted <- allres$day * allres$area
Before I do this, R tells me that the variable allres$tmp does not exist (which is right). However, after I add allres$tmp_weighted to the dataframe and call allres$tmp, I get the data for allres$tmp_weighted. It seems as if the part after the underscore does not matter at all for R. I tried it with several other variables / names and it always works that way
I don't think this should work like this? Am I overlooking something here? Below I pasted some code together with output from the Console.
# first check whether variable exists
allres_sw$Ndpsw
> NULL
#define new variable with underscore in variable name
allres_sw$Ndpsw_weighted <- allres_sw$Ndepswcrit * allres_sw$Area
#check again whether variable exists
allres_sw$Ndpsw
> [1] 17.96480 217.50240 44.84415 42.14560 0.00000 43.14444 53.98650 9.81939 0.00000 110.67720
# this is the output that I would expect from "Ndpsw_weighted" - and indeed do get
allres_sw$Ndpsw_weighted
> [1] 17.96480 217.50240 44.84415 42.14560 0.00000 43.14444 53.98650 9.81939 0.00000 110.67720
Have a look at ?`[` or ?`$` in your R console. If you look at the name argument of the extract functions it states that names are partially matched when using the $ operator (as opposed to the `[[` operator, which uses exact matches based on the exact = TRUE argument).
From ?`$`
A literal character string or a name (possibly backtick quoted). For extraction, this is normally (see under ‘Environments’) partially matched to the names of the object.
Just to expand somewhat on Wil's answer... From help('$'):
x$name
name
A literal character string or a name (possibly backtick
quoted). For extraction, this is normally (see under
‘Environments’) partially matched to the names
of the object.
x$name is equivalent to
x[["name", exact = FALSE]]. Also, the partial matching
behavior of [[ can be controlled using the exact argument.
exact
Controls possible partial matching of [[ when
extracting by a character vector (for most objects, but see under
‘Environments’). The default is no partial matching. Value
NA allows partial matching but issues a warning when it
occurs. Value FALSE allows partial matching without any
warning.
The key phrase here is partial match (see pmatch). You'll understand now that the underscore is nothing special - you can abbreviate allres_sw$Ndpsw_weighted to allres_sw$Ndp, provided no name is more similar than allres_sw$Ndepswcrit.

Does R ignore variable name extensions starting with a dot in a data frame?

I have a data frame where some variable names include a "." extension. It seems R will ignore this extension and give me the variable anyway if I try to access it without the complete variable name. What is causing this/why does it happen? Below is a mini example of my problem.
y <- rnorm(100)
x <- rlnorm(100)
data <- cbind.data.frame(y,x)
colnames(data) <- c("y.rnorm","x.rlnorm")
# these both return the same thing
data$y
data$y.rnorm
R is setup to provide results to partial matches by design.
Read section 3.4 & 4.3 of the R language definition.
3.4.1 Character. The strings in i are matched against the names attribute of x and the resulting integers are used. For [[ and $ partial matching is used if exact matching fails, so x$aa will match x$aabb if x does not contain a component named "aa" and "aabb" is the only name which has prefix "aa". For [[, partial matching can be controlled via the exact argument which defaults to NA indicating that partial matching is allowed, but should result in a warning when it occurs. Setting exact to TRUE prevents partial matching from occurring, a FALSE value allows it and does not issue any warnings. Note that [ always requires an exact match. The string "" is treated specially: it indicates ‘no name’ and matches no element (not even those without a name). Note that partial matching is only used when extracting and not when replacing.
and
4.3.2 Partial matching on tags. Each remaining named supplied argument is compared to the remaining formal arguments using partial matching. If the name of the supplied argument matches exactly with the first part of a formal argument then the two arguments are considered to be matched. It is an error to have multiple partial matches. Notice that if f <- function(fumble, fooey) fbody, then f(f = 1, fo = 2) is illegal, even though the 2nd actual argument only matches fooey. f(f = 1, fooey = 2) is legal though since the second argument matches exactly and is removed from consideration for partial matching. If the formal arguments contain ‘...’ then partial matching is only applied to arguments that precede it.
update
As noted by Uwe, there may be a pending update to the R language definition as the behaviour of [[ partial matching has changed. A look through R News shows the following as deprecated and defunct with the 3.1.0 release:
Partial matching when using the $ operator on data frames now throws a warning and may become defunct in the future. If partial matching is intended, replace foo$bar by foo[["bar", exact = FALSE]]
The $ operator is designed to do partial matching. See the Subsetting chapter of Advanced R by Hadley Wickham, Ctrl + F "partial matching":
There’s one important difference between $ and [[. $ does partial matching:
x <- list(abc = 1)
x$a
## [1] 1
x[["a"]]
## NULL
If you want to avoid this behaviour you can set the global option warnPartialMatchDollar to TRUE. Use with caution: it may affect behaviour in other code you have loaded (e.g., from a package).

Why does R use partial matching?

I know that for a list, partial matching is done when indexing using the basic operators $ and [[. For example:
ll <- list(yy=1)
ll$y
[1] 1
But I am still an R newbie and this is new for me, partial matching of function arguments:
h <- function(xx=2)xx
h(x=2)
[1] 2
I want to understand how this works. What is the mechanism behind it? Does this have any side effects? I want understand how can someone test if the xx argument was given?
Edit after Andrie comment:
Internally R uses pmatch algorithm to match argument, here an example how this works:
pmatch("me", c("mean", "median", "mode")) # error multiple partial matches
[1] NA
> pmatch("mo", c("mean", "median", "mode")) # mo match mode match here
[1] 3
But why R has such feature? What is the basic idea behind of partial unique matching?
Partial matching exists to save you typing long argument names. The danger with it is that functions may gain additional arguments later on which conflict with your partial match. This means that it is only suitable for interactive use – if you are writing code that will stick around for a long time (to go in a package, for example) then you should always write the full argument name. The other problem is that by abbreviating an argument name, you can make your code less readable.
Two common good uses are:
len instead of length.out with the seq (or seq.int) function.
all instead of all.names with the ls function.
Compare:
seq.int(0, 1, len = 11)
seq.int(0, 1, length.out = 11)
ls(all = TRUE)
ls(all.names = TRUE)
In both of these cases, the code is just about as easy to read with the shortened argument names, and the functions are old and stable enough that another argument with a conflicting name is unlikely to be added.
A better solution for saving on typing is, rather than using abbreviated names, to use auto-completion of variable and argument names. R GUI and RStudio support this using the TAB key, and Architect supports this using CTRL+Space.
Some relevant sections of R Language Definition:
3.4.1 Indexing by vectors
...assume that the expression is x[i]. Then the following possibilities exist according to the type of i
Character. The strings in i are matched against the names attribute of x and the resulting integers are used. For [[ and $ partial matching is used if exact matching fails, so x$aa will match x$aabb if x does not contain a component named "aa" and "aabb" is the only name which has prefix "aa". For [[, partial matching can be controlled via the exact argument which defaults to NA indicating that partial matching is allowed, but should result in a warning when it occurs. Setting exact to TRUE prevents partial matching from occurring, a FALSE value allows it and does not issue any warnings. Note that [ always requires an exact match. The string "" is treated specially: it indicates ‘no name’ and matches no element (not even those without a name). Note that partial matching is only used when extracting and not when replacing.
[see also ?Extract]
4.3.2 Argument matching
The first thing that occurs in a function evaluation is the matching of formal to the actual or supplied arguments. This is done by a three-pass process:
Exact matching on tags. For each named supplied argument the list of formal arguments is searched for an item whose name matches exactly. It is an error to have the same formal argument match several actuals or vice versa.
Partial matching on tags. Each remaining named supplied argument is compared to the remaining formal arguments using partial matching. If the name of the supplied argument matches exactly with the first part of a formal argument then the two arguments are considered to be matched. It is an error to have multiple partial matches. Notice that if f <- function(fumble, fooey) fbody, then f(f = 1, fo = 2) is illegal, even though the 2nd actual argument only matches fooey. f(f = 1, fooey = 2) is legal though since the second argument matches exactly and is removed from consideration for partial matching. If the formal arguments contain ... then partial matching is only applied to arguments that precede it.
Positional matching.
Note that when subsetting a tibble
Partial matching of column names with $ and [[ is not supported, and NULL is returned. For $, a warning is given.

Resources