I'm trying to understand what backticks do in R.
From what I can tell, this is not explained in the ?Quotes documentation page for R.
For example, at the R console:
"[["
# [1] "[["
`[[`
# .Primitive("[[")
It seem to be returning the equivalent to:
get("[[")
A pair of backticks is a way to refer to names or combinations of symbols that are otherwise reserved or illegal. Reserved are words like if are part of the language, while illegal includes non-syntactic combinations like c a t. These two categories, reserved and illegal, are referred to in R documentation as non-syntactic names.
Thus,
`c a t` <- 1 # is valid R
and
> `+` # is equivalent to typing in a syntactic function name
function (e1, e2) .Primitive("+")
As a commenter mentioned, ?Quotes does contain some information on the backtick, under Names and Identifiers:
Identifiers consist of a sequence of letters, digits, the period (.) and the underscore. They must not start with a digit nor underscore, nor with a period followed by a digit. Reserved words are not valid identifiers.
The definition of a letter depends on the current locale, but only ASCII digits are considered to be digits.
Such identifiers are also known as syntactic names and may be used directly in R code. Almost always, other names can be used provided they are quoted. The preferred quote is the backtick (`), and deparse will normally use it, but under many circumstances single or double quotes can be used (as a character constant will often be converted to a name). One place where backticks may be essential is to delimit variable names in formulae: see formula
This prose is a little hard to parse. What it means is that for R to parse a token as a name, it must be 1) a sequence of letters digits, the period and underscores, that 2) is not a reserved word in the language. Otherwise, to be parsed as a name, backticks must be used.
Also check out ?Reserved:
Reserved words outside quotes are always parsed to be references to the objects linked to in the 'Description', and hence they are not allowed as syntactic names (see make.names). They are allowed as non-syntactic names, e.g.inside backtick quotes.
In addition, Advanced R has some examples of how backticks are used in expressions, environments, and functions.
They are equivalent to verbatim. For example... try this:
df <- data.frame(20a=c(1,2),b=c(3,4))
gives error
df <- data.frame(`20a`=c(1,2),b=c(3,4))
doesn't give error
Here is an incomplete answer using improper vocabulary: backticks can indicate to R that you are using a function in a non-standard way. For instance, here is a use of [[, the list subsetting function:
temp <- list("a"=1:10, "b"=rnorm(5))
extract element one, the usual way
temp[[1]]
extract element one using the [[ function
`[[`(temp,1)
Related
I'm working with the following code:
Y_Columns <- c("Y.1.1")
paste('{"ImportId":"', Y_Columns, '"}', sep = "")
The paste function produces the following output:
"{\"ImportId\":\"Y.1.1\"}"
How do I get the paste function to omit the \? Such that, the output is:
"{"ImportId":"Y.1.1"}"
Thank you for your help.
Note: I did do a search on SO to see if there were any Q's that asked "what is an escape character in R". But I didn't review all the 160 answers, only the first 20.
This is one way of demonstrating what I wrote in my comment:
out <- paste('{"ImportId":"', Y_Columns, '"}', sep = "")
out
#[1] "{\"ImportId\":\"Y.1.1\"}"
?print
print(out,quote=FALSE)
#[1] {"ImportId":"Y.1.1"}
Both R and regex patterns use escape characters to allow special characters to be displayed in print output or input. (And sometimes regex patterns need to have doubled escapes.) R has a few characters that need to be "escaped" in certain situation. You illustrated one such situation: including double-quote character inside a result that will be printed with surrounding double-quotes. If you were intending to include any single quotes inside a character value that was delimited by single quotes at the time of creation, they would have needed to be escaped as well.
out2 <- '\'quoted\''
nchar(out2)
#[1] 8 ... note that neither the surround single-quotes nor the backslashes get counted
> out2
[1] "'quoted'" ... and the default output quote-char is a double-quote.
Here's a good Q&A to review:How to replace '+' using gsub() function in R
It has two answers, both useful: one shows how to double escape a special character and the other shows how to use teh fixed argument to get around that requirement.
And another potentially useful Q&A on the topic of handling Windows paths:
File path issues in R using Windows ("Hex digits in character string" error)
And some further useful reading suggestions: Look at the series of help pages that start with capital letters. (Since I can never remember which one has which nugget of essential information, I tried ?Syntax first and it has a "See Also" list of essential reading: Arithmetic, Comparison, Control, Extract, Logic, NumericConstants, Paren, Quotes, Reserved. and I then realized what I wanted to refer you to was most likely ?Quotes where all the R-specific escape sequence letters should be listed.
I encountered a strange problem with R. I have a dataframe with several variables. I add a variable to this dataframe that contains an underscore, for example:
allres$tmp_weighted <- allres$day * allres$area
Before I do this, R tells me that the variable allres$tmp does not exist (which is right). However, after I add allres$tmp_weighted to the dataframe and call allres$tmp, I get the data for allres$tmp_weighted. It seems as if the part after the underscore does not matter at all for R. I tried it with several other variables / names and it always works that way
I don't think this should work like this? Am I overlooking something here? Below I pasted some code together with output from the Console.
# first check whether variable exists
allres_sw$Ndpsw
> NULL
#define new variable with underscore in variable name
allres_sw$Ndpsw_weighted <- allres_sw$Ndepswcrit * allres_sw$Area
#check again whether variable exists
allres_sw$Ndpsw
> [1] 17.96480 217.50240 44.84415 42.14560 0.00000 43.14444 53.98650 9.81939 0.00000 110.67720
# this is the output that I would expect from "Ndpsw_weighted" - and indeed do get
allres_sw$Ndpsw_weighted
> [1] 17.96480 217.50240 44.84415 42.14560 0.00000 43.14444 53.98650 9.81939 0.00000 110.67720
Have a look at ?`[` or ?`$` in your R console. If you look at the name argument of the extract functions it states that names are partially matched when using the $ operator (as opposed to the `[[` operator, which uses exact matches based on the exact = TRUE argument).
From ?`$`
A literal character string or a name (possibly backtick quoted). For extraction, this is normally (see under ‘Environments’) partially matched to the names of the object.
Just to expand somewhat on Wil's answer... From help('$'):
x$name
name
A literal character string or a name (possibly backtick
quoted). For extraction, this is normally (see under
‘Environments’) partially matched to the names
of the object.
x$name is equivalent to
x[["name", exact = FALSE]]. Also, the partial matching
behavior of [[ can be controlled using the exact argument.
exact
Controls possible partial matching of [[ when
extracting by a character vector (for most objects, but see under
‘Environments’). The default is no partial matching. Value
NA allows partial matching but issues a warning when it
occurs. Value FALSE allows partial matching without any
warning.
The key phrase here is partial match (see pmatch). You'll understand now that the underscore is nothing special - you can abbreviate allres_sw$Ndpsw_weighted to allres_sw$Ndp, provided no name is more similar than allres_sw$Ndepswcrit.
I'm trying to understand what backticks do in R.
From what I can tell, this is not explained in the ?Quotes documentation page for R.
For example, at the R console:
"[["
# [1] "[["
`[[`
# .Primitive("[[")
It seem to be returning the equivalent to:
get("[[")
A pair of backticks is a way to refer to names or combinations of symbols that are otherwise reserved or illegal. Reserved are words like if are part of the language, while illegal includes non-syntactic combinations like c a t. These two categories, reserved and illegal, are referred to in R documentation as non-syntactic names.
Thus,
`c a t` <- 1 # is valid R
and
> `+` # is equivalent to typing in a syntactic function name
function (e1, e2) .Primitive("+")
As a commenter mentioned, ?Quotes does contain some information on the backtick, under Names and Identifiers:
Identifiers consist of a sequence of letters, digits, the period (.) and the underscore. They must not start with a digit nor underscore, nor with a period followed by a digit. Reserved words are not valid identifiers.
The definition of a letter depends on the current locale, but only ASCII digits are considered to be digits.
Such identifiers are also known as syntactic names and may be used directly in R code. Almost always, other names can be used provided they are quoted. The preferred quote is the backtick (`), and deparse will normally use it, but under many circumstances single or double quotes can be used (as a character constant will often be converted to a name). One place where backticks may be essential is to delimit variable names in formulae: see formula
This prose is a little hard to parse. What it means is that for R to parse a token as a name, it must be 1) a sequence of letters digits, the period and underscores, that 2) is not a reserved word in the language. Otherwise, to be parsed as a name, backticks must be used.
Also check out ?Reserved:
Reserved words outside quotes are always parsed to be references to the objects linked to in the 'Description', and hence they are not allowed as syntactic names (see make.names). They are allowed as non-syntactic names, e.g.inside backtick quotes.
In addition, Advanced R has some examples of how backticks are used in expressions, environments, and functions.
They are equivalent to verbatim. For example... try this:
df <- data.frame(20a=c(1,2),b=c(3,4))
gives error
df <- data.frame(`20a`=c(1,2),b=c(3,4))
doesn't give error
Here is an incomplete answer using improper vocabulary: backticks can indicate to R that you are using a function in a non-standard way. For instance, here is a use of [[, the list subsetting function:
temp <- list("a"=1:10, "b"=rnorm(5))
extract element one, the usual way
temp[[1]]
extract element one using the [[ function
`[[`(temp,1)
I have list l which has grave accent "`" in output. Why am I getting this in some variable and not in others?
l
$`AMLM12PAH037A-B`
Left.Gene.Symbols Right.Gene.Symbols
PCMTD1 0 1
STK31 3 0
$AMLOT120AT
Left.Gene.Symbols Right.Gene.Symbols
ARHGEF3 2 0
CD96 2 0
RALYL 12 0
TRIO 0 1
You can't have invalid names, in this case it is the - inside it. If you do, you will either get them backticked, like yours, converted, or an error depending on how you made them.
You also cannot start a name with a number among other restrictions.
See the functions check.names and make.names
From the R FAQ:
A syntactic name is a string the parser interprets as this type of expression. It consists of letters, numbers, and the dot and (for
versions of R at least 1.9.0) underscore characters, and starts with
either a letter or a dot not followed by a number. Reserved words are
not syntactic names.
An object name is a string associated with an object that is assigned in an expression either by having the object name on the
left of an assignment operation or as an argument to the assign()
function. It is usually a syntactic name as well, but can be any
non-empty string if it is quoted (and it is always quoted in the
call to assign()).
An argument name is what appears to the left of the equals sign when supplying an argument in a function call (for example,
f(trim=.5)). Argument names are also usually syntactic names, but
again can be anything if they are quoted.
An element name is a string that identifies a piece of an object (a component of a list, for example.) When it is used on the right
of the ‘$’ operator, it must be a syntactic name, or quoted.
Otherwise, element names can be any strings. (When an object is
used as a database, as in a call to eval() or attach(), the element
names become object names.)
This is perhaps rather a minor question...
but just a moment ago I was looking through some code I had written and noticed that I tend to just use ="something" and ='something_else' completely interchangeably, often in the same function.
So my question is: Is there R code in which using one or other (single or double quotes) has different behaviour? Or are they totally synonymous?
According to http://stat.ethz.ch/R-manual/R-patched/library/base/html/Quotes.html, "[s]ingle and double quotes delimit character constants. They can be used interchangeably but double quotes are preferred (and character constants are printed using double quotes), so single quotes are normally only used to delimit character constants containing double quotes."
Just for curiosity, there is a further explaination in R-help mailing list for Why double quote is preferred in R:
To avoid confusion for those who are accustomed to programming in the
C family of languages (C, C++, Java), where there is a difference in
the meaning of single quotes and double quotes.
A C programmer reads 'a' as a single character and "a" as a character
string consisting of the letter 'a' followed by a null character to
terminate the string.
In R there is no character data type, there are
only character strings. For consistency with other languages it helps
if character strings are delimited by double quotes. The single quote
version in R is for convenience.
(Since) On most keyboards you don't need to
use the shift key to type a single quote but you do need the shift for
a double quote.
> print(""hi"")
Error: unexpected symbol in "print(""hi"
> print("'hi'")
[1] "'hi'"
> print("hi")
[1] "hi"