difference between <- and = in R with an example [duplicate] - r

This question already has answers here:
What are the differences between "=" and "<-" assignment operators?
(9 answers)
Closed 3 years ago.
I was wondering if there is a technical difference between the assignment operators "=" and "<-" in R. So, does it make any difference if I use:
Example 1: a = 1 or a <- 1
Example 2: a = c(1:20) or a <- c(1:20)
Thanks for your help
Sven

Yes there is. This is what the help page of '=' says:
The operators <- and = assign into the
environment in which they are
evaluated. The operator <- can be used
anywhere, whereas the operator = is
only allowed at the top level (e.g.,
in the complete expression typed at
the command prompt) or as one of the
subexpressions in a braced list of
expressions.
With "can be used" the help file means assigning an object here. In a function call you can't assign an object with = because = means assigning arguments there.
Basically, if you use <- then you assign a variable that you will be able to use in your current environment. For example, consider:
matrix(1,nrow=2)
This just makes a 2 row matrix. Now consider:
matrix(1,nrow<-2)
This also gives you a two row matrix, but now we also have an object called nrow which evaluates to 2! What happened is that in the second use we didn't assign the argument nrow 2, we assigned an object nrow 2 and send that to the second argument of matrix, which happens to be nrow.
Edit:
As for the edited questions. Both are the same. The use of = or <- can cause a lot of discussion as to which one is best. Many style guides advocate <- and I agree with that, but do keep spaces around <- assignments or they can become quite hard to interpret. If you don't use spaces (you should, except on twitter), I prefer =, and never use ->!
But really it doesn't matter what you use as long as you are consistent in your choice. Using = on one line and <- on the next results in very ugly code.

Related

Recoding a discrete variable

I have a discrete variable with scores from 1-3. I would like to change it so 1=2, 2=1, 3=3.
I have tried
recode(Data$GEB43, "c(1=2; 2=1; 3=3")
But that doesn't work.
I know this is an overly stupid question that can be solved in excel within seconds but trying to learn how to do basics like this in R.
We should always provide a minimal reproducible example:
df <- data.frame(x=c(1,1,2,2,3,3))
You didn't specifiy the package for recode so I assumed dplyr. ?dplyr::recode tells us how the arguments should be passed to the function. In the original question "c(1=2; 2=1; 3=3" is a string (i.e. not an R expression but a character string "c(1=2; 2=1; 3=3"). To make it an R expression we have to get rid of the double quotes and replace the ; with ,. Additionally, we need a closing bracket i.e. c(1=2, 2=1, 3=3). But still, as ?dplyr::recode tells us, this is not the way to pass this information to recode:
Solution using dplyr::recode:
dplyr::recode(df$x, "1"=2, "2"=1, "3"=3)
Returns:
[1] 2 2 1 1 3 3
Assuming, you mean dplyr::recode, the syntax is
recode(.x, ..., .default = NULL, .missing = NULL)
From the documentation it says
.x - A vector to modify
... - Replacements. For character and factor .x, these should be named and replacement is based only on their name. For numeric .x, these can be named or not. If not named, the replacement is done based on position i.e. .x represents positions to look for in replacements
So when you have numeric value you can replace based on position directly
recode(1:3, 2, 1, 3)
#[1] 2 1 3

R: parse nested parentheses

I would like to parse nested parentheses using R. No, this is not JASON. I have seen examples using perl, php, and python, but I am having trouble getting anything to work in R. Here is an example of some data:
(a(a(a)(aa(a)a)a)a)((b(b)b)b)(((cc)c)c)
I would like to split this string based on the three parent parentheses into three separate strings:
(a(a(a)(aa(a)a)a)a)
((b(b)b)b)
(((cc)c)c)
One of the challenges I am facing is the lack of a consistent structure in terms of total pairs of child parentheses within the parent parentheses, and the number of consecutive open or closed parentheses. Notice the consecutive open parentheses in the data with Bs and with Cs. This has made attempts to use regex very difficult. Also, the data within a given parent parentheses will have many common characters to other parent parentheses, so looking for all "a"s or "b"s is not possible - I fabricated this data to help people see the three parent parentheses better.
Basically I am looking for a function that identifies parent parentheses. In other words, a function that can find parentheses that are not contained with parentheses, and return all instances of this for a given string.
Any ideas? I appreciate the help.
Here is one directly adapted from Regex Recursion with \\((?>[^()]|(?R))*\\):
s = "(a(a(a)(aa(a)a)a)a)((b(b)b)b)(((cc)c)c)"
matched <- gregexpr("\\((?>[^()]|(?R))*\\)", s, perl = T)
substring(s, matched[[1]], matched[[1]] + attr(matched[[1]], "match.length") - 1)
# [1] "(a(a(a)(aa(a)a)a)a)" "((b(b)b)b)" "(((cc)c)c)"
Assuming that there are matching paranthesis, you can try the following (this is like a PDA, pushdown automata, if you are familiar with theory of computation):
str <- '(a(a(a)(aa(a)a)a)a)((b(b)b)b)(((cc)c)c)'
indices <- c(0, which(cumsum(sapply(unlist(strsplit(str, split='')),
function(x) ifelse(x == '(', 1, ifelse(x==')', -1, 0))))==0))
sapply(1:(length(indices)-1), function(i) substring(str, indices[i]+1, indices[i+1]))
# [1] "(a(a(a)(aa(a)a)a)a)" "((b(b)b)b)" "(((cc)c)c)"

"x" and 'x' , x <- 5 and x = 5 are same in R? [duplicate]

This question already has answers here:
What are the differences between "=" and "<-" assignment operators?
(9 answers)
Closed 6 years ago.
It seems output is same when I use any of the two. Is there any difference between them?
x <- "hello"
x <- 'hello'
x = "hello"
x = 'hello'
It seems all are giving same output. Is there difference between them? and when to use them?
Thanks in advance!
In your examples, the answer is yes. But see notes below:
https://stat.ethz.ch/R-manual/R-devel/library/base/html/Quotes.html
Single and double quotes delimit character constants. They can be used
interchangeably but double quotes are preferred (and character
constants are printed using double quotes), so single quotes are
normally only used to delimit character constants containing double
quotes.
http://blog.revolutionanalytics.com/2008/12/use-equals-or-arrow-for-assignment.html
A little history before we continue: when the R language (and S before
it) was first created, <- was the only choice of assignment operator.
This is a hangover from the language APL, where the arrow notation was
used to distinguish assignment (assign the value 3 to x) from equality
(is x equal to 3?). (Professor Ripley reminds me that on APL keyboards
there was an actual key on the keyboard with the arrow symbol on it,
so the arrow was a single keystroke back then. The same was true of
the AT&T terminals first used for the predecessors of S as described
in the Blue Book.) However many modern languages (such as C, for
example) use = for assignment, so beginners using R often found the
arrow notation cumbersome, and were prone to use = by mistake. But R
uses = for yet another purpose: associating function arguments with
values (as in pnorm(1, sd=2), to set the standard deviation to 2). To
make things easier for new users familiar with languages like C, R
added the capability in 2001 to also allow = be used as an assignment
operator, on the basis that the intent (assignment or association) is
usually clear by context. So, x = 3
clearly means "assign 3 to x", whereas
f(x = 3)
clearly means "call function f, setting the argument x to 3".

Use of $ and %% operators in R

I have been working with R for about 2 months and have had a little bit of trouble getting a hold of how the $ and %% terms.
I understand I can use the $ term to pull a certain value from a function (e.g. t.test(x)$p.value), but I'm not sure if this is a universal definition. I also know it is possible to use this to specify to pull certain data.
I'm also curious about the use of the %% term, in particular, if I am placing a value in between it (e.g. %x%) I am aware of using it as a modulator or remainder e.g. 7 %% 5 returns 2. Perhaps I am being ignorant and this is not real?
Any help or links to literature would be greatly appreciated.
Note: I have been searching for this for a couple hours so excuse me if I couldn't find it!
You are not really pulling a value from a function but rather from the list object that the function returns. $ is actually an infix that takes two arguments, the values preceding and following it. It is a convenience function designed that uses non-standard evaluation of its second argument. It's called non-standard because the unquoted characters following $ are first quoted before being used to extract a named element from the first argument.
t.test # is the function
t.test(x) # is a named list with one of the names being "p.value"
The value can be pulled in one of three ways:
t.test(x)$p.value
t.test(x)[['p.value']] # numeric vector
t.test(x)['p.value'] # a list with one item
my.name.for.p.val <- 'p.value'
t.test(x)[[ my.name.for.p.val ]]
When you surround a set of characters with flanking "%"-signs you can create your own vectorized infix function. If you wanted a pmax for which the defautl was na.rm=TRUE do this:
'%mypmax%' <- function(x,y) pmax(x,y, na.rm=TRUE)
And then use it without quotes:
> c(1:10, NA) %mypmax% c(NA,10:1)
[1] 1 10 9 8 7 6 7 8 9 10 1
First, the $ operator is for selecting an element of a list. See help('$').
The %% operator is the modulo operator. See help('%%').
The '$' operator is used to select particular element from a list or any other data component which contains sub data components.
For example: data is a list which contains a matrix named MATRIX and other things too.
But to get the matrix we write,
Print(data$MATRIX)
The %% operator is a modulus operator ; which provides the remainder.
For example: print(7%%3)
Will print 1 as an output

Least occurring element in vector R

If I have a vector
vec = c('a','a','a','b','b','c','c','c','c','c')
Is there a simple way to find the least occurring element in vec? Thanks!
Edit: is there a simple way to do it with characters?
This should work, even if more than one of the elements is tied as the least frequent item:
vec = c(1,1,1,2,2,3,3,3,3,3)
f <- table(vec)
as.numeric(names(f[f == min(f)]))
# [1] 2
table(vec)[which.min(table(vec))]
(In all likelihood a duplicate, although I have searched. Found what seemed to be similar on the max side: Create a variable capturing the most frequent occurence by group Maybe it sounds familiar to that one 'cuz I posted an answer?)

Resources