What's the difference between the in and the %in% operator in R? Why do I sometimes need the percentage signs and other times I do not?
The 3 following objects are all functions :
identity
%in%
for
We can call them this way :
`identity`(1)
#> [1] 1
`%in%`(1, 1:2)
#> [1] TRUE
`for`(x, seq(3), print("yes"))
#> [1] "yes"
#> [1] "yes"
#> [1] "yes"
But usually we don't!
"identity" is syntactic (i.e. it's a "regular" name, doesn't contain weird symbols etc), AND it is not a protected word so we can skip the tick marks and call just :
identity(1)
%in% is not syntactic but it starts and ends with "%" so it can be used in infix form. you could define your own `%fun%` <-function(x,y) ... and use it this way to, so we would call :
1 %in% 1:2
for is a control flow construct, like if, while and repeat, all of those are functions with a given number of arguments, but they come in the language with more convenient ways to call them than the above. here we'd do :
for (x in seq(3)) print("yes")
in is just used to parse the code, it's not a function here (just like else isn't either.
?`%in%` will show you what the function does.
Depending on how you define it, there is no in operator in R, only an %in% operator. Instead, in is “syntactic sugar” as part of the syntax for the for loop.
By contrast, %in% is an actual operator defined in R which tests whether the left-hand expression is contained in the right-hand expression. As other operators in R, %in% is a regular function and can be called as such:
if (`%in%`(x, seq(3, 5))) message("yes")
… or it can be redefined:
`%in%` = function (x, table) {
message("I redefined %in%!")
match(x, table, nomatch = 0L) > 0L
}
if (5 %in% 1 : 10) message("yes")
# I redefined %in%!
# yes
Usage-wise, I have figured out the answer: I can only use in when I loop through everything, and %in%for checking whether something is contained in something else, e.g.
for (x in seq(3)){
if (x %in% seq(3,5)) print("yes")
}
I want to adjust my function so that my if and else if statements recognize the name of the dataframe used and execute the correct plotting function. These are some mock data structured the same as mine:
df1<-data.frame(A=c(1,2,2,3,4,5,1,1,2,3),
B=c(4,4,2,3,4,2,1,5,2,2),
C=c(3,3,3,3,4,2,5,1,2,3),
D=c(1,2,5,5,5,4,5,5,2,3),
E=c(1,4,2,3,4,2,5,1,2,3),
dummy1=c("yes","yes","no","no","no","no","yes","no","yes","yes"),
dummy2=c("high","low","low","low","high","high","high","low","low","high"))
df1[colnames(df1)] <- lapply(df1[colnames(df1)], factor)
vals <- colnames(df1)[1:5]
dummies <- colnames(df1)[-(1:5)]
step1 <- lapply(dummies, function(x) df1[, c(vals, x)])
step2 <- lapply(step1, function(x) split(x, x[, 6]))
names(step2) <- dummies
tbls <- unlist(step2, recursive=FALSE)
tbls<-lapply(tbls, function(x) x[(names(x) %in% names(df1[c(1:5)]))])
A<-lapply(tbls,"[", c(1,2))
B<-lapply(tbls,"[", c(3,4))
C<-lapply(tbls,"[", c(3,4))
list<-list(A,B,C)
names(list)<-c("A","B","C")
And this is my function:
plot_1<-function (section, subsample) {
data<-list[grep(section, names(list))]
data<-data[[1]]
name=as.character(names(data))
if(section=="A" && subsample=="None"){plot_likert_general_section(df1[c(1:2)],"A")}
else if (section==name && subsample=="dummy1"){plot_likert(data$dummy1.yes, title=paste("How do the",name,"topics rank?"));plot_likert(data$Ldummy1.no, title = paste("How do the",name,"topics rank?"))}
}
Basically what I want it to do is plot a certain graph by specifying section and subsample I'm interested in if, for example, I want to plot section C and subsample dummy.1, I just write:
plot_1(section="C", subsample="dummy1)
I want to avoid writing this:
else if (section=="A" && subsample=="dummy1"){plot_likert(data$dummy1.yes, title=paste("How do the A topics rank?"));plot_likert(data$Ldummy1.no, title = paste("How do the A topics rank?"))}
else if (section=="B" && subsample=="dummy1"){plot_likert(data$dummy1.yes, title=paste("How do the B topics rank?"));plot_likert(data$Ldummy1.no, title = paste("How do the B topics rank?"))}
else if (section=="C" && subsample=="dummy1"){plot_likert(data$dummy1.yes, title=paste("How do the c topics rank?"));plot_likert(data$Ldummy1.no, title = paste("How do the C topics rank?"))}
else if (section=="C" && subsample=="dummy2")...
.
.
}
So I tried to extract the dataframe used from the list so that it matches the string of the section typed in the function (data<-list[grep(section, names(list))]) and store its name as a character (name=as.character(names(data))), because I thought that in this way the function would have recognized the string "A", "B" or "C" by itself, without the need for me to specify each condition.
However, if I run it, I get this error: Warning message: In section == name && subsample == "dummy1" : 'length(x) = 4 > 1' in coercion to 'logical(1)', that, from what I understand, is due to the presence of a vector in the statement. But I have no idea how to correct for this (I'm still quite new to R).
How can I fix the function so that it does what I want? Thanks in advance!
Well, I can't really test your code without the plot_likert_general_section function or the plot_likert function, but I've done a bit of simplifying and best practices--passing list in as an argument, consistent spaces and assignment operators, etc.--and this is my best guess as to what you want:
plot_1 = function(list, section, subsample) { ## added `list` as an argument
data = list[[grep(section, names(list))]] # use [[ to extract a single item
name = as.character(names(data))
if(subsample == "None"){
plot_likert_general_section(df1[c(1:2)], section)
} else {
yesno = paste(subsample, c("yes", "no"), sep = ".")
plot_likert(data[[yesno[1]]], title = paste("How do the", name, "topics rank?"))
plot_likert(data[[yesno[2]]], title = paste("How do the", name, "topics rank?"))
}
}
plot_1(list, section = "C", subsample = "dummy1)
I'm not sure if your plot_likert functions use base or grid graphics--but either way you'll need to handle the multiple plots. With base, probably use mfrow() to display both of them, if grid I'd suggest putting them in a list to return them both, and then maybe using gridExtra::grid.arrange() (or similar) to plot both of them.
You're right that the error is due to passing a vector where a single value is expected. Try inserting print statements before the equality test to diagnose why this is.
Also, be careful with choosing variable names like name which are baseR functions (e.g. ?name). I'd also recommend following the tidyverse style guide here: https://style.tidyverse.org/.
I have the following function:
foo <- function(...){
dots <- list(...)
response <- dots[[1]]
if(is(dots[[2]],'list') == TRUE){print('yes')} else print('no')
}
This produces the following output:
foo('yes'):
Error in dots[[2]] : subscript out of bounds
How can I use a 'not-yet' indexed parameter so that I can stall the function when it's TRUE or when its FALSE. For example, when it's TRUE I would do some stuff based on this, otherwise when it is FALSE the part of the function that uses it won't run.
However, R want's me to at-least index dots with some list values.
For example, If I wanted to use just:
foo('yes')
>Error in dots[[2]] : subscript out of bounds
#otherwise
foo('yes',c('some','list'))
>'yes'
I want to be able to run foo('yes') and for it to print no. Essentially, some parameters won't get used in the function, and so in this case when it's not assigned anything then run the else statement.
Picking up on #Rui Barradas and #Allan Camerons comments, I can achieve the same expectation with function(pred=NULL,...) by using:
foo <- function(...){
dots <- list(...)
response <- dots[[1]]
print(response)
if(length(dots) > 1){
if(is(dots[[2]],'list') == TRUE){
print('yes')
} else print('no')
} else if (length(dots) == 1){
dots[[2]] = NULL
}
}
Results:
> foo('yes',list(1, 2, 3))
[1] "yes"
> foo('yes')
[1] "yes"
Are there any cleaner alternatives to this that reduce the amount of code? My approach produces quite some clutter. The only issue I have with this is that If I wanted dots[[3]], I would have to implement further conditionals to access this or set it to NULL.
I am using the ifelse function in order to obtain either a vector with NA if all the "value" of this vector are NA or a vector with all the values not equal to "NA_NA". In my example, I would like to obtain this results
[1] "14_mter" "78_ONHY"
but I am obtaining this
[1] "14_mter"
my example:
vect=c("NA_NA", "14_mter", "78_ONHY")
out=ifelse(all(is.na(vec)), vec, vec[which(vec!="NA_NA")])
What is wrong in this function ?
ifelse is vectorized and its result is as long as the test argument. all(is.na(vect)) is always just length one, hence the result. a regular if/else clause is fine here.
vect <- c("NA_NA", "14_mter", "78_ONHY")
if (all(is.na(vect))) {
out <- vect
} else {
out <- vect[vect != "NA_NA"]
}
out
#> [1] "14_mter" "78_ONHY"
additional note: no need for the which() here
The ifelse help file, referring to its three arguments test, yes and no, says:
ifelse returns a value with the same shape as test which is filled
with elements selected from either yes or no depending on whether the
element of test is TRUE or FALSE.
so if the test has a length of 1, which is the case for the code in the question, then the result will also have length 1. Instead try one of these.
1) Use if instead of ifelse. if returns the value of the chosen leg so just assign that to out.
out <- if (all(is.na(vect))) vect else vect[which(vect != "NA_NA")]
2) The collapse package has an allNA function so a variation on (1) is:
library(collapse)
out <- if (allNA(vect)) vect else vect[which(vect != "NA_NA")]
3) Although not recommended if you really wanted to use ifelse it could be done by wrapping each leg in list(...) so that the condition and two legs all have the same length, i.e. 1.
out <- ifelse(all(is.na(vect)), list(vect), list(vect[which(vect != "NA_NA")])) |>
unlist()
If the NAvalue is always the string NA_NA, this works:
grep("NA_NA", vect, value = TRUE, invert = TRUE)
[1] "14_mter" "78_ONHY"
While the pattern matches the NA_NA value, the invert = TRUE argument negates the match(es) and produces the unmatched values
Data:
vect=c("NA_NA", "14_mter", "78_ONHY")
B <- 10000
results <- replicate(B, {
hand <- sample(hands1, 2)
(hand[1] %in% aces & hand[2] %in% facecard) | (hand[2] %in% aces & hand[1] %in% facecard)
})
mean(results)
this piece of code works perfectly and do the desired thi
this is a monte carlo simulation. I don't understand the way they put curly brackets {} in the replicate function. i can understand the function of that code but i cant understand the way they put the code.
The reason is that we have multiple expressions
hand <- sample(hands1, 2)
is the first expression and the second is
(hand[1] %in% aces & hand[2] %in% facecard) | (hand[2] %in% aces & hand[1] %in% facecard)
i.e. if there is only a single expression, we don't need to block with {}
It is a general case and not related to replicate i.e. if we use a for loop with a single expression, it doesn't need any {}
for(i in 1:5)
print(i)
and similarly, something like if/else
n <- 5
if(n == 5)
print(n)
It is only needed when we need more than one expression