Is there a way to use percent operators in R with the double colon notation?
For example:
foreach::%dopar%
foreach::"%dopar%"
Even though quotes work in the double colon case, when referring to an operator like this, you should enclose the operator in single back ticks:
foreach::`%dopar%`
This lets you refer to name anywhere that is not a legal identifier (a legal identifier starts with a letter and is made up of only letters and numbers and underscores).
`%%`(6, 4) # Calling the mod operator in a weird way
`strange %^*&` <- 2 # Defining a weird variable
`strange %^*&` + `strange %^*&` # Using the weird variable
Related
What I know so far ...
1) Backticks are used when creating tibbles with non-syntactic variable/column names that contain numbers, spaces, or other symbols (because normally you can only name columns with letters right?)
tb <- tibble(
': ) ' = "smile, ' ' = "space",
'2000' = "number", "double_quotes" = "normal_text")
However, when I use double quotes here the tibble still forms with the nonsyntactic symbols/numbers.
2) Double quotes are used to subset column names when using double brackets.
tb[["double_quotes"]]
And here, when I use single quotes to subset, it still works as well.
3) When subsetting using $, to select for nonsyntactic names, I must use single quotes, but here again, if I subset using double quotes, it works as well
Again, tb$": )" works just as well as tb$': )'
So are they effectively interchangeable?
Interestingly, when I plot a graph
annoying <- tibble(
`1` = 1:10,
`2` = `1` * 2 + rnorm(length(`1`))
)
ggplot(annoying, aes(x = `1`, y = `2`)) +
geom_point()
Single quotes must be used when referring to the nonsyntactic variables because otherwise, it looks like ggplot treats X and Y as single points of 1 and 2 respectively. Are there any other cases like this?
It's important to distinguish between single quotes (') and backticks (or "back-single-quotes") (`).
Most of what you want to know is in ?Quotes:
Single (') and double (") quotes delimit character constants. They can be
used interchangeably but double quotes are preferred (and
character constants are printed using double quotes), so single
quotes are normally only used to delimit character constants
containing double quotes.
Almost always, other [i.e., non-syntactically valid] names can be used
provided they are quoted. The preferred quote is the backtick
(‘`’) ... under many
circumstances single or double quotes can be used (as a character
constant will often be converted to a name). One place where
backticks may be essential is to delimit variable names in
formulae: see ‘formula’.
For example, if you want to define a variable name containing a space, you need back-ticks:
`a b` <- 1
Double quotes also work here (to my surprise!)
"a b" <- 1
but if you want to use the resulting variable in an expression you'll need to use back-ticks. "a b" + 1 gives an error (" non-numeric argument to binary operator") but `a b`+1 works.
As #r2evans points out, the same rules apply in tidyverse expressions. You can use double- or single-quotes (if you want) to define new variables: mtcars %>% mutate("my stuff"=4), but if you want to subsequently use that variable (or any other non-syntactic variable) in an expression, you have to backtick-protect it: mtcars %>% mutate("my stuff"=4, new=`my stuff` + 5).
It's probably best practice/least confusing to just use backticks for all non-syntactic variable reference and single quotes for character constants.
I have created a list (Based on items in a column) in order to subset my dataset into smaller datasets relating to a particular variable. This list contains strings with hyphens in them -.
dim.list <- c('Age_CareContactDate-Gender', 'Age_CareContactDate-Group',
'Age_ServiceReferralReceivedDate-Gender',
'Age_ServiceReferralReceivedDate-Gender-0-18',
'Age_ServiceReferralReceivedDate-Group',
'Age_ServiceReferralReceivedDate-Group-ReferralReason')
I have then written some code to loop through each item in this list subsetting my main data.
for (i in dim.list) {assign(paste("df1.",i,sep=""),df[df$Dimension==i,])}
This works fine, however when I come to aggregate this in order to get some summary statistics I can't reference the dataset as R stops reading after the hyphen (I assume that the hyphen is some special character)
If I use a different list without hyphens e.g.
dim.list.abr <- c('ACCD_Gen','ACCD_Grp',
'ASRRD_Gen',
'ASRRD_Gen_0_18',
'ASRRD_Grp',
'ASRRD_Grp_RefRsn')
When my for loop above executes I get 6 data.frames with no observations.
Why is this happening?
Comment to answer:
Hyphens aren't allowed in standard variable names. Think of a simple example: a-b. Is it a variable name with a hyphen or is it a minus b? The R interpreter assumes a minus b, because it doesn't require spaces for binary operations. You can force non-standard names to work using backticks, e.g.,
# terribly confusing names:
`a-b` <- 5
`x+y` <- 10
`mean(x^2)` <- "this is awful"
but you're better off following the rules and using standard names without special characters like + - * / % $ # # ! & | ^ ( [ ' " in them. At ?quotes there is a section on Names and Identifiers:
Identifiers consist of a sequence of letters, digits, the period (.) and the underscore. They must not start with a digit nor underscore, nor with a period followed by a digit. Reserved words are not valid identifiers.
So that's why you're getting an error, but what you're doing isn't good practice. I completely agree with Axeman's comments. Use split to divide up your data frame into a list. And keep it in a list rather than use assign, it will be much easier to loop over or use lapply with that way. You might want to read my answer at How to make a list of data frames for a lot of discussion and examples.
Regarding your comment "dim.list is not the complete set of unique entries in the Dimensions column", that just means you need to subset before you split:
nice_list = df[df$Dimension %in% dim.list, ]
nice_list = split(nice_list, nice_list$Dimension)
I would like to parse nested parentheses using R. No, this is not JASON. I have seen examples using perl, php, and python, but I am having trouble getting anything to work in R. Here is an example of some data:
(a(a(a)(aa(a)a)a)a)((b(b)b)b)(((cc)c)c)
I would like to split this string based on the three parent parentheses into three separate strings:
(a(a(a)(aa(a)a)a)a)
((b(b)b)b)
(((cc)c)c)
One of the challenges I am facing is the lack of a consistent structure in terms of total pairs of child parentheses within the parent parentheses, and the number of consecutive open or closed parentheses. Notice the consecutive open parentheses in the data with Bs and with Cs. This has made attempts to use regex very difficult. Also, the data within a given parent parentheses will have many common characters to other parent parentheses, so looking for all "a"s or "b"s is not possible - I fabricated this data to help people see the three parent parentheses better.
Basically I am looking for a function that identifies parent parentheses. In other words, a function that can find parentheses that are not contained with parentheses, and return all instances of this for a given string.
Any ideas? I appreciate the help.
Here is one directly adapted from Regex Recursion with \\((?>[^()]|(?R))*\\):
s = "(a(a(a)(aa(a)a)a)a)((b(b)b)b)(((cc)c)c)"
matched <- gregexpr("\\((?>[^()]|(?R))*\\)", s, perl = T)
substring(s, matched[[1]], matched[[1]] + attr(matched[[1]], "match.length") - 1)
# [1] "(a(a(a)(aa(a)a)a)a)" "((b(b)b)b)" "(((cc)c)c)"
Assuming that there are matching paranthesis, you can try the following (this is like a PDA, pushdown automata, if you are familiar with theory of computation):
str <- '(a(a(a)(aa(a)a)a)a)((b(b)b)b)(((cc)c)c)'
indices <- c(0, which(cumsum(sapply(unlist(strsplit(str, split='')),
function(x) ifelse(x == '(', 1, ifelse(x==')', -1, 0))))==0))
sapply(1:(length(indices)-1), function(i) substring(str, indices[i]+1, indices[i+1]))
# [1] "(a(a(a)(aa(a)a)a)a)" "((b(b)b)b)" "(((cc)c)c)"
This question already has answers here:
What are the differences between "=" and "<-" assignment operators?
(9 answers)
Closed 6 years ago.
It seems output is same when I use any of the two. Is there any difference between them?
x <- "hello"
x <- 'hello'
x = "hello"
x = 'hello'
It seems all are giving same output. Is there difference between them? and when to use them?
Thanks in advance!
In your examples, the answer is yes. But see notes below:
https://stat.ethz.ch/R-manual/R-devel/library/base/html/Quotes.html
Single and double quotes delimit character constants. They can be used
interchangeably but double quotes are preferred (and character
constants are printed using double quotes), so single quotes are
normally only used to delimit character constants containing double
quotes.
http://blog.revolutionanalytics.com/2008/12/use-equals-or-arrow-for-assignment.html
A little history before we continue: when the R language (and S before
it) was first created, <- was the only choice of assignment operator.
This is a hangover from the language APL, where the arrow notation was
used to distinguish assignment (assign the value 3 to x) from equality
(is x equal to 3?). (Professor Ripley reminds me that on APL keyboards
there was an actual key on the keyboard with the arrow symbol on it,
so the arrow was a single keystroke back then. The same was true of
the AT&T terminals first used for the predecessors of S as described
in the Blue Book.) However many modern languages (such as C, for
example) use = for assignment, so beginners using R often found the
arrow notation cumbersome, and were prone to use = by mistake. But R
uses = for yet another purpose: associating function arguments with
values (as in pnorm(1, sd=2), to set the standard deviation to 2). To
make things easier for new users familiar with languages like C, R
added the capability in 2001 to also allow = be used as an assignment
operator, on the basis that the intent (assignment or association) is
usually clear by context. So, x = 3
clearly means "assign 3 to x", whereas
f(x = 3)
clearly means "call function f, setting the argument x to 3".
I know the basics like == and !=, or even the difference (vaguely) between & and &&. But stuff like %in% and %% and some stuff used in the context of sprintf(), like sprintf("%.2f", x) stuff I have no idea about.
Worst of all, they're hard to search for on the Internet because they're special characters and I don't know what they're called...
There are several different things going on here with the percent symbol:
Binary Operators
As several have already pointed out, things of the form %%, %in%, %*% are binary operators (respectively modulo, match, and matrix multiply), just like a +, -, etc. They are functions that operate on two arguments that R recognizes as being special due to their name structure (starts and ends with a %). This allows you to use them in form:
Argument1 %fun_name% Argument2
instead of the more traditional:
fun_name(Argument1, Argument2)
Keep in mind that the following are equivalent:
10 %% 2 == `%%`(10, 2)
"hello" %in% c("hello", "world") == `%in%`("hello", c("hello", "world"))
10 + 2 == `+`(10, 2)
R just recognizes the standard operators as well as the %x% operators as special and allows you to use them as traditional binary operators if you don't quote them. If you quote them (in the examples above with backticks), you can use them as standard two argument functions.
Custom Binary Operators
The big difference between the standard binary operators and %x% operators is that you can define custom binary operators and R will recognize them as special and treat them as binary operators:
`%samp%` <- function(e1, e2) sample(e1, e2)
1:10 %samp% 2
# [1] 1 9
Here we defined a binary operator version of the sample function
"%" (Percent) as a token in special function
The meaning of "%" in function like sprintf or format is completely different and has nothing to do with binary operators. The key thing to note is that in those functions the % character is part of a quoted string, and not a standard symbol on the command line (i.e. "%" and % are very different). In the context of sprintf, inside a string, "%" is a special character used to recognize that the subsequent characters have a special meaning and should not be interpreted as regular text. For example, in:
sprintf("I'm a number: %.2f", runif(3))
# [1] "I'm a number: 0.96" "I'm a number: 0.74" "I'm a number: 0.99"
"%.2f" means a floating point number (f) to be displayed with two decimals (.2). Notice how the "I'm a number: " piece is interpreted literally. The use of "%" allows sprintf users to mix literal text with special instructions on how to represent the other sprintf arguments.
The R Language Definition, section 3.1.4 refers to them as "special binary operators". One of the ways they're special is that users can define new binary operators using the %x% syntax (where x is any valid name).
The Writing your own functions section of An Introduction to R, refers to them as Binary Operators (which is somewhat confusing because + is also a binary operator):
10.2 Defining new binary operators
Had we given the bslash() function a different name, namely one of the
form
%anything%
it could have been used as a binary operator in expressions
rather than in function form. Suppose, for example, we choose ! for
the internal character. The function definition would then start as
> "%!%" <- function(X, y) { ... }
(Note the use of quote marks.) The function could then be used as X %!% y. (The backslash symbol itself
is not a convenient choice as it presents special problems in this
context.)
The matrix multiplication operator, %*%, and the outer product matrix
operator %o% are other examples of binary operators defined in this
way.
They don’t have a special name as far as I know. They are described in R operator syntax and precedence.
The %anything% operators are just normal functions, which can be defined by yourself. You do need to put the name of the operator in backticks (`…`), though: this is how R treats special names.
`%test%` = function (a, b) a * b
2 %test% 4
# 8
The sprintf format strings are entirely unrelated, they are not operators at all. Instead, they are just the conventional C-style format strings.
The help file, and the general entry, is indeed a good starting point: ?'%in%'
For example, you can see how the operator '%in%' is defined:
"%in%" <- function(x, table) match(x, table, nomatch = 0) > 0
You can even create your own operators:
'%ni%' <- Negate('%in%')