R extract operator: [ vs $ [duplicate] - r

This question already has answers here:
What is the meaning of the dollar sign "$" in R function()?
(2 answers)
Closed 2 years ago.
There are multiple post in the internet regarding the differences and similarities about [ and $. I see some post where $ is recommended only for interactive use but not for programming. However, I am not sure I understand if this is a preference or there is an explanation behind this idea.
Now lets say I am writing a package or function, if I am extracting an element by name (e.g., mtcars[["mpg"]]) why I should avoid using mtcars$mpg?

There are two differences that really matter between [[ and $:
[[ - works with strings (i.e. it supports variable substitution), $ doesn't. If you have my_var = "mpg", you can use mtcars[[my_var]], but there isn't a good way to use my_var with $.
$ auto-completes, if a partial column name is unambiguous. mtcars$m will return the mpg column, mtcars[["m"]] will return NULL. mtcars$d will return NULL because multiple columns start with a "d".
#1 makes [[ more flexible for programming - it's extremely common in programmatic use to be working with column names stored as strings.
#2 makes $ more dangerous - you should not use abbreviated column names in programming, however in interactive use it can be nice and quick. (Though this is largely moot with RStudio's auto-completion features, if you use that IDE.)

$ does partial matching: if you have a column named xxx in a dataframe dat, then dat$xx will return the xxx column (unless you also have a xx column). This can be dangerous.
I always use [["..."]] for another reason: I use RStudio, and there is a nice highlighting for strings, whereas there's no highlighting with $.

Related

Why R is returning object when names are partially matching? [duplicate]

This question already has answers here:
Weird case with data tables in R, column names are mixed
(3 answers)
Closed 1 year ago.
Let say I have below list
Dat = list('AAA' = 1:4, 'BBB' = 5:9)
Now I have below syntax
Dat$AA
## [1] 1 2 3 4
However my question in why R is retuning value from Dat$AA given that there is no element with such name? Is R returning partial names?
If this is the case, then I think such behaviour is utterly risky and should not be allowed.
According to ?Extract
Both [[ and $ select a single element of the list. The main difference is that $ does not allow computed indices, whereas [[ does. x$name is equivalent to x[["name", exact = FALSE]]. Also, the partial matching behavior of [[ can be controlled using the exact argument.
Also, it is written under name
name - A literal character string or a name (possibly backtick quoted). For extraction, this is normally (see under ‘Environments’) partially matched to the names of the object.
where the usage is
x$name
As the exact is FALSE, it allows for partial matching when we use $. It is one of the reasons to use [[ where the exact = TRUE as the usage is
x[[i, exact = TRUE]]
It is also mentioned in ?Extract how to change the options
Thus the default behaviour is to use partial matching only when extracting from recursive objects (except environments) by $. Even in that case, warnings can be switched on by options(warnPartialMatchDollar = TRUE).
akrun already explained that this behaviour is documented, but generally we prefer that this doesn't happen. Consequently, a set of lines that I always put into my .Rprofile is the following:
options(
warnPartialMatchArgs = TRUE,
warnPartialMatchDollar = TRUE,
warnPartialMatchAttr = TRUE
)
This will be run when R starts, and generate warnings whenever partial matching occurs in R. There are three contexts that it happens as shown: in function arguments, when using $, and in using attr(). I don't think you can turn the partial matching off entirely, but this should highlight whenever it happens and prevent ensuing bugs.

Aside from partial matching, can the $ operator do anything that [ and [[ cannot?

I believe the following to be true of the $ operator:
It allows names to be partially matched. For example, data$Sky can match data$Skywalker if there is no Sky name in use.
It cannot be used for atomic vectors, unlike [ and [[.
It cannot be combine with operators like -. There is no valid syntax like mtcars$-mpg. [ and [[ cannot do this with names either, but mtcars[,-1] works.
It is for names only.
Partial matching aside, for a data frame, data$name is equivalent to data[,"name"] e.g. mtcars$cyl is the same as mtcars[,"cyl"]. I'm pretty sure that data[["name"]] is also equivalent, e.g. mtcars[["cyl"]].
Partial matching aside, for a named list that is not a data frame, data$name is the same as data[["name"]].
Does this mean that if I don't care about partial matching, I can always replace $ with [ or [[? Or is there some functionality that I've missed?
For base R, my best guess comes from the documentation for $. The following quotes are the most relevant:
$ is only valid for recursive objects
$ does not allow computed indices, whereas [[ does. x$name is equivalent to x[["name", exact = FALSE]]. Also, the partial matching behavior of [[ can be controlled using the exact argument.
the default behaviour is to use partial matching only when extracting from recursive objects (except environments) by $. Even in that case, warnings can be switched on by options(warnPartialMatchDollar = TRUE).
So it seems that the documentation confirms my belief that, aside from partial matching, $ is just syntactic sugar. However, there are four points where I am unsure:
I never put too much faith in R's documentation. Because of this, I'm sure that an experienced user will be able to find a hole in what I've said.
I say that this is only my guess for base R because $ is a generic operator and can therefore have its meaning changed by packages, tibbles being a common example.
$ and [ can also be used for environments, but I have never seen anyone do so.
I don't know what "computed indices" are.
According to a book on advanced R, the $ and the [ operator are the same on dataframes (not on lists) except from the partial matching. It states
$ is a shorthand operator: x$y is roughly equivalent to x[["y"]].
...The one important difference between $ and [[ is that $ does
(left-to-right) partial matching:
Here is the quote: Section 4.3.2 of the next link:
https://adv-r.hadley.nz/subsetting.html#section-1

Why "R help" does not work for some commands? [duplicate]

This question already has answers here:
How to get help in R?
(6 answers)
Closed 2 years ago.
I wanted to use "help in R" in order to see some information about some commands such as "for", "if", "while", "repeat" etc. But there is no information in "R help" regarding such commands. I would like to know why?
I use "R help" for the above commands like below:
?for
?while
?if
?repeat
R requires that keywords are used in syntactically valid form. The way R works, it expects that if for instance is followed by an expression in parentheses, and a body. ?if is not valid R syntax.
Conversely, ? is an operator that expects an identifier after it.
To make it valid, you should quote the if identifier in backticks. That way, R parses the expression as ? followed by an identifier, rather than ? followed by an incomplete if expression:
?`if`
Backtick-quoting is R’s way of saying: “hey, that thing between backticks is a valid identifier, even if it totally doesn’t look like one”. You could (but generally shouldn’t!) totally use it to use wonky variable names:
`name with spaces` = 2
message(`name with spaces` + 5)
# 7
This feature is more useful when applied to column names of externally imported data (which sometimes contains spaces or other invalid identifier characters), or when defining operators.

R seems to ignore part of variable name after underscore

I encountered a strange problem with R. I have a dataframe with several variables. I add a variable to this dataframe that contains an underscore, for example:
allres$tmp_weighted <- allres$day * allres$area
Before I do this, R tells me that the variable allres$tmp does not exist (which is right). However, after I add allres$tmp_weighted to the dataframe and call allres$tmp, I get the data for allres$tmp_weighted. It seems as if the part after the underscore does not matter at all for R. I tried it with several other variables / names and it always works that way
I don't think this should work like this? Am I overlooking something here? Below I pasted some code together with output from the Console.
# first check whether variable exists
allres_sw$Ndpsw
> NULL
#define new variable with underscore in variable name
allres_sw$Ndpsw_weighted <- allres_sw$Ndepswcrit * allres_sw$Area
#check again whether variable exists
allres_sw$Ndpsw
> [1] 17.96480 217.50240 44.84415 42.14560 0.00000 43.14444 53.98650 9.81939 0.00000 110.67720
# this is the output that I would expect from "Ndpsw_weighted" - and indeed do get
allres_sw$Ndpsw_weighted
> [1] 17.96480 217.50240 44.84415 42.14560 0.00000 43.14444 53.98650 9.81939 0.00000 110.67720
Have a look at ?`[` or ?`$` in your R console. If you look at the name argument of the extract functions it states that names are partially matched when using the $ operator (as opposed to the `[[` operator, which uses exact matches based on the exact = TRUE argument).
From ?`$`
A literal character string or a name (possibly backtick quoted). For extraction, this is normally (see under ‘Environments’) partially matched to the names of the object.
Just to expand somewhat on Wil's answer... From help('$'):
x$name
name
A literal character string or a name (possibly backtick
quoted). For extraction, this is normally (see under
‘Environments’) partially matched to the names
of the object.
x$name is equivalent to
x[["name", exact = FALSE]]. Also, the partial matching
behavior of [[ can be controlled using the exact argument.
exact
Controls possible partial matching of [[ when
extracting by a character vector (for most objects, but see under
‘Environments’). The default is no partial matching. Value
NA allows partial matching but issues a warning when it
occurs. Value FALSE allows partial matching without any
warning.
The key phrase here is partial match (see pmatch). You'll understand now that the underscore is nothing special - you can abbreviate allres_sw$Ndpsw_weighted to allres_sw$Ndp, provided no name is more similar than allres_sw$Ndepswcrit.

R data table issue

I'm having trouble working with a data table in R. This is probably something really simple but I can't find the solution anywhere.
Here is what I have:
Let's say t is the data table
colNames <- names(t)
for (col in colNames) {
print (t$col)
}
When I do this, it prints NULL. However, if I do it manually, it works fine -- say a column name is "sample". If I type t$"sample" into the R prompt, it works fine. What am I doing wrong here?
You need t[[col]]; t$col does an odd form of evaluation.
edit: incorporating #joran's explanation:
t$col tries to find an element literally named 'col' in list t, not what you happen to have stored as a value in a variable named col.
$ is convenient for interactive use, because it is shorter and one can skip quotation marks (i.e. t$foo vs. t[["foo"]]. It also does partial matching, which is very convenient but can under unusual circumstances be dangerous or confusing: i.e. if a list contains an element foolicious, then t$foo will retrieve it. For this reason it is not generally recommended for programming.
[[ can take either a literal string ("foo") or a string stored in a variable (col), and does not do partial matching. It is generally recommended for programming (although there's no harm in using it interactively).

Resources