R lists -- how are element names handled? - r

I have noticed this unexpected feature:
foo <- list(whatever=1:10)
Now, the following works as well:
foo$wha
foo$w
foo$whateve
However, the following does not:
foo[["wha"]]
This has the unexpected consequences (unexpected for me, that is) that if you have two potential names, like "CXCL1" and "CXCL11", and you want to know whether CXCL1 is not null by inspecting !is.null(foo$CXCL1), it will return TRUE even if CXCL1 null, but CXCL11 isn't.
My questions are:
How does that work?
What is the difference between foo$whatever and foo[["whatever"]]?
Why would anyone want this behavior and how do I disable it?

Partial matching only works with unique initial subsequences of list names. So, for example:
> l <- list(score=1, scotch=2)
> l$core #only INITIAL subsequences
NULL
> l$sco #only UNIQUE subsequences
NULL
> l$scor
[1] 1
Both [[ and $ select a single element of the list. The main differences are that $ does not allow computed indices, whereas [[ does, and that partial matching is allowed by default with operator $ but not with [[.
These Extract or Replacement operators come from S, although R restricts the use of partial matching, while S uses partial matching in most operators by default.
In your example, if CXCL1 and CXCL11 coexist and you index foo$CXCL1, that's not a partial match and should return CXCL1's value. If not, maybe there's another problem.
I should point out, [[ not allowing partial matching as default starts from version R 2.7.0 onwards.

Related

R unexpectedly auto-completing data.frame column names [duplicate]

I know that for a list, partial matching is done when indexing using the basic operators $ and [[. For example:
ll <- list(yy=1)
ll$y
[1] 1
But I am still an R newbie and this is new for me, partial matching of function arguments:
h <- function(xx=2)xx
h(x=2)
[1] 2
I want to understand how this works. What is the mechanism behind it? Does this have any side effects? I want understand how can someone test if the xx argument was given?
Edit after Andrie comment:
Internally R uses pmatch algorithm to match argument, here an example how this works:
pmatch("me", c("mean", "median", "mode")) # error multiple partial matches
[1] NA
> pmatch("mo", c("mean", "median", "mode")) # mo match mode match here
[1] 3
But why R has such feature? What is the basic idea behind of partial unique matching?
Partial matching exists to save you typing long argument names. The danger with it is that functions may gain additional arguments later on which conflict with your partial match. This means that it is only suitable for interactive use – if you are writing code that will stick around for a long time (to go in a package, for example) then you should always write the full argument name. The other problem is that by abbreviating an argument name, you can make your code less readable.
Two common good uses are:
len instead of length.out with the seq (or seq.int) function.
all instead of all.names with the ls function.
Compare:
seq.int(0, 1, len = 11)
seq.int(0, 1, length.out = 11)
ls(all = TRUE)
ls(all.names = TRUE)
In both of these cases, the code is just about as easy to read with the shortened argument names, and the functions are old and stable enough that another argument with a conflicting name is unlikely to be added.
A better solution for saving on typing is, rather than using abbreviated names, to use auto-completion of variable and argument names. R GUI and RStudio support this using the TAB key, and Architect supports this using CTRL+Space.
Some relevant sections of R Language Definition:
3.4.1 Indexing by vectors
...assume that the expression is x[i]. Then the following possibilities exist according to the type of i
Character. The strings in i are matched against the names attribute of x and the resulting integers are used. For [[ and $ partial matching is used if exact matching fails, so x$aa will match x$aabb if x does not contain a component named "aa" and "aabb" is the only name which has prefix "aa". For [[, partial matching can be controlled via the exact argument which defaults to NA indicating that partial matching is allowed, but should result in a warning when it occurs. Setting exact to TRUE prevents partial matching from occurring, a FALSE value allows it and does not issue any warnings. Note that [ always requires an exact match. The string "" is treated specially: it indicates ‘no name’ and matches no element (not even those without a name). Note that partial matching is only used when extracting and not when replacing.
[see also ?Extract]
4.3.2 Argument matching
The first thing that occurs in a function evaluation is the matching of formal to the actual or supplied arguments. This is done by a three-pass process:
Exact matching on tags. For each named supplied argument the list of formal arguments is searched for an item whose name matches exactly. It is an error to have the same formal argument match several actuals or vice versa.
Partial matching on tags. Each remaining named supplied argument is compared to the remaining formal arguments using partial matching. If the name of the supplied argument matches exactly with the first part of a formal argument then the two arguments are considered to be matched. It is an error to have multiple partial matches. Notice that if f <- function(fumble, fooey) fbody, then f(f = 1, fo = 2) is illegal, even though the 2nd actual argument only matches fooey. f(f = 1, fooey = 2) is legal though since the second argument matches exactly and is removed from consideration for partial matching. If the formal arguments contain ... then partial matching is only applied to arguments that precede it.
Positional matching.
Note that when subsetting a tibble
Partial matching of column names with $ and [[ is not supported, and NULL is returned. For $, a warning is given.

R seems to ignore part of variable name after underscore

I encountered a strange problem with R. I have a dataframe with several variables. I add a variable to this dataframe that contains an underscore, for example:
allres$tmp_weighted <- allres$day * allres$area
Before I do this, R tells me that the variable allres$tmp does not exist (which is right). However, after I add allres$tmp_weighted to the dataframe and call allres$tmp, I get the data for allres$tmp_weighted. It seems as if the part after the underscore does not matter at all for R. I tried it with several other variables / names and it always works that way
I don't think this should work like this? Am I overlooking something here? Below I pasted some code together with output from the Console.
# first check whether variable exists
allres_sw$Ndpsw
> NULL
#define new variable with underscore in variable name
allres_sw$Ndpsw_weighted <- allres_sw$Ndepswcrit * allres_sw$Area
#check again whether variable exists
allres_sw$Ndpsw
> [1] 17.96480 217.50240 44.84415 42.14560 0.00000 43.14444 53.98650 9.81939 0.00000 110.67720
# this is the output that I would expect from "Ndpsw_weighted" - and indeed do get
allres_sw$Ndpsw_weighted
> [1] 17.96480 217.50240 44.84415 42.14560 0.00000 43.14444 53.98650 9.81939 0.00000 110.67720
Have a look at ?`[` or ?`$` in your R console. If you look at the name argument of the extract functions it states that names are partially matched when using the $ operator (as opposed to the `[[` operator, which uses exact matches based on the exact = TRUE argument).
From ?`$`
A literal character string or a name (possibly backtick quoted). For extraction, this is normally (see under ‘Environments’) partially matched to the names of the object.
Just to expand somewhat on Wil's answer... From help('$'):
x$name
name
A literal character string or a name (possibly backtick
quoted). For extraction, this is normally (see under
‘Environments’) partially matched to the names
of the object.
x$name is equivalent to
x[["name", exact = FALSE]]. Also, the partial matching
behavior of [[ can be controlled using the exact argument.
exact
Controls possible partial matching of [[ when
extracting by a character vector (for most objects, but see under
‘Environments’). The default is no partial matching. Value
NA allows partial matching but issues a warning when it
occurs. Value FALSE allows partial matching without any
warning.
The key phrase here is partial match (see pmatch). You'll understand now that the underscore is nothing special - you can abbreviate allres_sw$Ndpsw_weighted to allres_sw$Ndp, provided no name is more similar than allres_sw$Ndepswcrit.

Does R ignore variable name extensions starting with a dot in a data frame?

I have a data frame where some variable names include a "." extension. It seems R will ignore this extension and give me the variable anyway if I try to access it without the complete variable name. What is causing this/why does it happen? Below is a mini example of my problem.
y <- rnorm(100)
x <- rlnorm(100)
data <- cbind.data.frame(y,x)
colnames(data) <- c("y.rnorm","x.rlnorm")
# these both return the same thing
data$y
data$y.rnorm
R is setup to provide results to partial matches by design.
Read section 3.4 & 4.3 of the R language definition.
3.4.1 Character. The strings in i are matched against the names attribute of x and the resulting integers are used. For [[ and $ partial matching is used if exact matching fails, so x$aa will match x$aabb if x does not contain a component named "aa" and "aabb" is the only name which has prefix "aa". For [[, partial matching can be controlled via the exact argument which defaults to NA indicating that partial matching is allowed, but should result in a warning when it occurs. Setting exact to TRUE prevents partial matching from occurring, a FALSE value allows it and does not issue any warnings. Note that [ always requires an exact match. The string "" is treated specially: it indicates ‘no name’ and matches no element (not even those without a name). Note that partial matching is only used when extracting and not when replacing.
and
4.3.2 Partial matching on tags. Each remaining named supplied argument is compared to the remaining formal arguments using partial matching. If the name of the supplied argument matches exactly with the first part of a formal argument then the two arguments are considered to be matched. It is an error to have multiple partial matches. Notice that if f <- function(fumble, fooey) fbody, then f(f = 1, fo = 2) is illegal, even though the 2nd actual argument only matches fooey. f(f = 1, fooey = 2) is legal though since the second argument matches exactly and is removed from consideration for partial matching. If the formal arguments contain ‘...’ then partial matching is only applied to arguments that precede it.
update
As noted by Uwe, there may be a pending update to the R language definition as the behaviour of [[ partial matching has changed. A look through R News shows the following as deprecated and defunct with the 3.1.0 release:
Partial matching when using the $ operator on data frames now throws a warning and may become defunct in the future. If partial matching is intended, replace foo$bar by foo[["bar", exact = FALSE]]
The $ operator is designed to do partial matching. See the Subsetting chapter of Advanced R by Hadley Wickham, Ctrl + F "partial matching":
There’s one important difference between $ and [[. $ does partial matching:
x <- list(abc = 1)
x$a
## [1] 1
x[["a"]]
## NULL
If you want to avoid this behaviour you can set the global option warnPartialMatchDollar to TRUE. Use with caution: it may affect behaviour in other code you have loaded (e.g., from a package).

Why $ guesses the column names when used for data frame? [duplicate]

I know that for a list, partial matching is done when indexing using the basic operators $ and [[. For example:
ll <- list(yy=1)
ll$y
[1] 1
But I am still an R newbie and this is new for me, partial matching of function arguments:
h <- function(xx=2)xx
h(x=2)
[1] 2
I want to understand how this works. What is the mechanism behind it? Does this have any side effects? I want understand how can someone test if the xx argument was given?
Edit after Andrie comment:
Internally R uses pmatch algorithm to match argument, here an example how this works:
pmatch("me", c("mean", "median", "mode")) # error multiple partial matches
[1] NA
> pmatch("mo", c("mean", "median", "mode")) # mo match mode match here
[1] 3
But why R has such feature? What is the basic idea behind of partial unique matching?
Partial matching exists to save you typing long argument names. The danger with it is that functions may gain additional arguments later on which conflict with your partial match. This means that it is only suitable for interactive use – if you are writing code that will stick around for a long time (to go in a package, for example) then you should always write the full argument name. The other problem is that by abbreviating an argument name, you can make your code less readable.
Two common good uses are:
len instead of length.out with the seq (or seq.int) function.
all instead of all.names with the ls function.
Compare:
seq.int(0, 1, len = 11)
seq.int(0, 1, length.out = 11)
ls(all = TRUE)
ls(all.names = TRUE)
In both of these cases, the code is just about as easy to read with the shortened argument names, and the functions are old and stable enough that another argument with a conflicting name is unlikely to be added.
A better solution for saving on typing is, rather than using abbreviated names, to use auto-completion of variable and argument names. R GUI and RStudio support this using the TAB key, and Architect supports this using CTRL+Space.
Some relevant sections of R Language Definition:
3.4.1 Indexing by vectors
...assume that the expression is x[i]. Then the following possibilities exist according to the type of i
Character. The strings in i are matched against the names attribute of x and the resulting integers are used. For [[ and $ partial matching is used if exact matching fails, so x$aa will match x$aabb if x does not contain a component named "aa" and "aabb" is the only name which has prefix "aa". For [[, partial matching can be controlled via the exact argument which defaults to NA indicating that partial matching is allowed, but should result in a warning when it occurs. Setting exact to TRUE prevents partial matching from occurring, a FALSE value allows it and does not issue any warnings. Note that [ always requires an exact match. The string "" is treated specially: it indicates ‘no name’ and matches no element (not even those without a name). Note that partial matching is only used when extracting and not when replacing.
[see also ?Extract]
4.3.2 Argument matching
The first thing that occurs in a function evaluation is the matching of formal to the actual or supplied arguments. This is done by a three-pass process:
Exact matching on tags. For each named supplied argument the list of formal arguments is searched for an item whose name matches exactly. It is an error to have the same formal argument match several actuals or vice versa.
Partial matching on tags. Each remaining named supplied argument is compared to the remaining formal arguments using partial matching. If the name of the supplied argument matches exactly with the first part of a formal argument then the two arguments are considered to be matched. It is an error to have multiple partial matches. Notice that if f <- function(fumble, fooey) fbody, then f(f = 1, fo = 2) is illegal, even though the 2nd actual argument only matches fooey. f(f = 1, fooey = 2) is legal though since the second argument matches exactly and is removed from consideration for partial matching. If the formal arguments contain ... then partial matching is only applied to arguments that precede it.
Positional matching.
Note that when subsetting a tibble
Partial matching of column names with $ and [[ is not supported, and NULL is returned. For $, a warning is given.

Why does R use partial matching?

I know that for a list, partial matching is done when indexing using the basic operators $ and [[. For example:
ll <- list(yy=1)
ll$y
[1] 1
But I am still an R newbie and this is new for me, partial matching of function arguments:
h <- function(xx=2)xx
h(x=2)
[1] 2
I want to understand how this works. What is the mechanism behind it? Does this have any side effects? I want understand how can someone test if the xx argument was given?
Edit after Andrie comment:
Internally R uses pmatch algorithm to match argument, here an example how this works:
pmatch("me", c("mean", "median", "mode")) # error multiple partial matches
[1] NA
> pmatch("mo", c("mean", "median", "mode")) # mo match mode match here
[1] 3
But why R has such feature? What is the basic idea behind of partial unique matching?
Partial matching exists to save you typing long argument names. The danger with it is that functions may gain additional arguments later on which conflict with your partial match. This means that it is only suitable for interactive use – if you are writing code that will stick around for a long time (to go in a package, for example) then you should always write the full argument name. The other problem is that by abbreviating an argument name, you can make your code less readable.
Two common good uses are:
len instead of length.out with the seq (or seq.int) function.
all instead of all.names with the ls function.
Compare:
seq.int(0, 1, len = 11)
seq.int(0, 1, length.out = 11)
ls(all = TRUE)
ls(all.names = TRUE)
In both of these cases, the code is just about as easy to read with the shortened argument names, and the functions are old and stable enough that another argument with a conflicting name is unlikely to be added.
A better solution for saving on typing is, rather than using abbreviated names, to use auto-completion of variable and argument names. R GUI and RStudio support this using the TAB key, and Architect supports this using CTRL+Space.
Some relevant sections of R Language Definition:
3.4.1 Indexing by vectors
...assume that the expression is x[i]. Then the following possibilities exist according to the type of i
Character. The strings in i are matched against the names attribute of x and the resulting integers are used. For [[ and $ partial matching is used if exact matching fails, so x$aa will match x$aabb if x does not contain a component named "aa" and "aabb" is the only name which has prefix "aa". For [[, partial matching can be controlled via the exact argument which defaults to NA indicating that partial matching is allowed, but should result in a warning when it occurs. Setting exact to TRUE prevents partial matching from occurring, a FALSE value allows it and does not issue any warnings. Note that [ always requires an exact match. The string "" is treated specially: it indicates ‘no name’ and matches no element (not even those without a name). Note that partial matching is only used when extracting and not when replacing.
[see also ?Extract]
4.3.2 Argument matching
The first thing that occurs in a function evaluation is the matching of formal to the actual or supplied arguments. This is done by a three-pass process:
Exact matching on tags. For each named supplied argument the list of formal arguments is searched for an item whose name matches exactly. It is an error to have the same formal argument match several actuals or vice versa.
Partial matching on tags. Each remaining named supplied argument is compared to the remaining formal arguments using partial matching. If the name of the supplied argument matches exactly with the first part of a formal argument then the two arguments are considered to be matched. It is an error to have multiple partial matches. Notice that if f <- function(fumble, fooey) fbody, then f(f = 1, fo = 2) is illegal, even though the 2nd actual argument only matches fooey. f(f = 1, fooey = 2) is legal though since the second argument matches exactly and is removed from consideration for partial matching. If the formal arguments contain ... then partial matching is only applied to arguments that precede it.
Positional matching.
Note that when subsetting a tibble
Partial matching of column names with $ and [[ is not supported, and NULL is returned. For $, a warning is given.

Resources