When are Backticks ' ' used compared to Double Quotes " "? - r

What I know so far ...
1) Backticks are used when creating tibbles with non-syntactic variable/column names that contain numbers, spaces, or other symbols (because normally you can only name columns with letters right?)
tb <- tibble(
': ) ' = "smile, ' ' = "space",
'2000' = "number", "double_quotes" = "normal_text")
However, when I use double quotes here the tibble still forms with the nonsyntactic symbols/numbers.
2) Double quotes are used to subset column names when using double brackets.
tb[["double_quotes"]]
And here, when I use single quotes to subset, it still works as well.
3) When subsetting using $, to select for nonsyntactic names, I must use single quotes, but here again, if I subset using double quotes, it works as well
Again, tb$": )" works just as well as tb$': )'
So are they effectively interchangeable?
Interestingly, when I plot a graph
annoying <- tibble(
`1` = 1:10,
`2` = `1` * 2 + rnorm(length(`1`))
)
ggplot(annoying, aes(x = `1`, y = `2`)) +
geom_point()
Single quotes must be used when referring to the nonsyntactic variables because otherwise, it looks like ggplot treats X and Y as single points of 1 and 2 respectively. Are there any other cases like this?

It's important to distinguish between single quotes (') and backticks (or "back-single-quotes") (`).
Most of what you want to know is in ?Quotes:
Single (') and double (") quotes delimit character constants. They can be
used interchangeably but double quotes are preferred (and
character constants are printed using double quotes), so single
quotes are normally only used to delimit character constants
containing double quotes.
Almost always, other [i.e., non-syntactically valid] names can be used
provided they are quoted. The preferred quote is the backtick
(‘`’) ... under many
circumstances single or double quotes can be used (as a character
constant will often be converted to a name). One place where
backticks may be essential is to delimit variable names in
formulae: see ‘formula’.
For example, if you want to define a variable name containing a space, you need back-ticks:
`a b` <- 1
Double quotes also work here (to my surprise!)
"a b" <- 1
but if you want to use the resulting variable in an expression you'll need to use back-ticks. "a b" + 1 gives an error (" non-numeric argument to binary operator") but `a b`+1 works.
As #r2evans points out, the same rules apply in tidyverse expressions. You can use double- or single-quotes (if you want) to define new variables: mtcars %>% mutate("my stuff"=4), but if you want to subsequently use that variable (or any other non-syntactic variable) in an expression, you have to backtick-protect it: mtcars %>% mutate("my stuff"=4, new=`my stuff` + 5).
It's probably best practice/least confusing to just use backticks for all non-syntactic variable reference and single quotes for character constants.

Related

Add thousand separator to levels in cut function

My x axis labels look like [100000,250000] which makes it hard to understand the numer at first sight, I want it to look like [100.000,250.000], I know that the cut2 function has a formatfun parameter but I think I don´t know how to use it properly.
Try using the "formatC" function on your cut data. e.g.
formatC(my_cuts, big.mark = ".", decimal.mark = ",")
Let's create an example to work on:
x <- cut(seq(0,1,length.out=8) + 1e6, 3)
This is a factor. Although at bottom it's a numeric array, you don't want to format its values; you want to format its levels, which are the strings associated with its values. This is what the levels look like in the example (calling head to prevent lots of printing in case x has many distinct levels):
(head(levels(x)))
[1] "(1000000,1000000.3]" "(1000000.3,1000000.7]" "(1000000.7,1000001]"
To format the levels, we need to pick them apart into their numeric components (which are separated by a comma ","), format each component, and reassemble the results.
Here's the picking-apart-and-formatting step in one go, using only base R functionality. It calls gsub and strsplit on the first line (for cleaning out the "(" and "]" characters and splitting each pair of numeric strings into two strings) and employs prettyNum on the second line (for the formatting), which conveniently will format any character string that looks like a number:
s <- lapply(strsplit(gsub("]|[(]", "", levels(x)), ","),
prettyNum, big.mark=".", decimal.mark=",", input.d.mark=".", preserve.width="individual")
(You might not need the input.d.mark argument, but I did because my locale uses "." for a decimal point, as you could see above. The docs say "individual" is the default for setting the output width, but that just isn't the case on my system: I had to specify it explicitly.)
The paste* functions will perform the reassembly, whose results we simply re-assign to the levels of x:
levels(x) <- paste0("(", sapply(s, function(a) paste0(a, collapse="; ")), "]")
(Since each number potentially already includes "," and "." delimiters, I have specified a third punctuation mark, ";", to separate the numbers themselves -- but you may use what you wish, of course.)
Let's display the new levels to verify the results:
(head(levels(x)))
[1] "(1.000.000; 1.000.000,3]" "(1.000.000,3; 1.000.000,7]" "(1.000.000,7; 1.000.001]"

Match text strings containing quotation marks which are encoded differently

I have two data frames containing the same information. The first contains a unique identifier. I would like to user dplyr::inner_join to match by title.
Unfortunately, one of the data frames contains {"} to signify a quote and the other simply uses a single quote
For example, I would like to match the two titles shown below.
The {"}Level of Readiness{"} for HCV treatment
The 'Level of Readiness' for HCV treatment
You can turn them into single quotes using gsub, but you need to enclose {"} with single quotes and ' with double quotes. Note that fixed = TRUE treats '{"}' as a literal string instead of a regular expression:
gsub('{"}', "'", 'The {"}Level of Readiness{"} for HCV treatment', fixed = TRUE)
# [1] "The 'Level of Readiness' for HCV treatment"

How to remove starting(suffix) special character("_") from column names [duplicate]

After I collapse my rows and separate using a semicolon, I'd like to delete the semicolons at the front and back of my string. Multiple semicolons represent blanks in a cell. For example an observation may look as follows after the collapse:
;TX;PA;CA;;;;;;;
I'd like the cell to look like this:
TX;PA;CA
Here is my collapse code:
new_df <- group_by(old_df, unique_id) %>% summarize_each(funs(paste(., collapse = ';')))
If I try to gsub for semicolon it removes all of them. If if I remove the end character it just removes one of the semicolons. Any ideas on how to remove all at the beginning and end, but leaving the ones in between the observations? Thanks.
use the regular expression ^;+|;+$
x <- ";TX;PA;CA;;;;;;;"
gsub("^;+|;+$", "", x)
The ^ indicates the start of the string, the + indicates multiple matches, and $ indicates the end of the string. The | states "OR". So, combined, it's searching for any number of ; at the start of a string OR any number of ; at the end of the string, and replace those with an empty space.
The stringi package allows you to specify patterns which you wish to preserve and trim everything else. If you only have letters there (though you could specify other pattern too), you could simply do
stringi::stri_trim_both(";TX;PA;CA;;;;;;;", "\\p{L}")
## [1] "TX;PA;CA"

Using percent operators with double colon in R

Is there a way to use percent operators in R with the double colon notation?
For example:
foreach::%dopar%
foreach::"%dopar%"
Even though quotes work in the double colon case, when referring to an operator like this, you should enclose the operator in single back ticks:
foreach::`%dopar%`
This lets you refer to name anywhere that is not a legal identifier (a legal identifier starts with a letter and is made up of only letters and numbers and underscores).
`%%`(6, 4) # Calling the mod operator in a weird way
`strange %^*&` <- 2 # Defining a weird variable
`strange %^*&` + `strange %^*&` # Using the weird variable

Removing Two Characters From A String

Related question here.
So I have a character vector with currency values that contain both dollar signs and commas. However, I want to try and remove both the commas and dollar signs in the same step.
This removes dollar signs =
d = c("$0.00", "$10,598.90", "$13,082.47")
gsub('\\$', '', d)
This removes commas =
library(stringr)
str_replace_all(c("10,0","tat,y"), fixed(c(","), "")
I'm wondering if I could remove both characters in one step.
I realize that I could just save the gsub results into a new variable, and then reapply that (or another function) on that variable. But I guess I'm wondering about a single step to do both.
Since answering in the comments is bad:
gsub('\\$|,', '', d)
replaces either $ or (|) , with an empty string.
take a look at ?regexp for additional special regex notation:
> gsub('[[:punct:]]', '', d)
[1] "000" "1059890" "1308247"

Resources