This question already has answers here:
What is the difference between `%in%` and `==`?
(3 answers)
Closed 5 years ago.
My question concerns the practical difference between the == and %in% operators in R.
I have run into an instance at work where filtering with either operator gives different results (e.g. one results on 800 rows, and the other 1200). I have run into this problem in the past and am able to validate in a way that ensures I get the results I desire. However, I am still stumped regarding how they are different.
Can someone please shed some light on how these operators are different?
%in% is value matching and "returns a vector of the positions of (first) matches of its first argument in its second" (See help('%in%')) This means you could compare vectors of different lengths to see if elements of one vector match at least one element in another. The length of output will be equal to the length of the vector being compared (the first one).
1:2 %in% rep(1:2,5)
#[1] TRUE TRUE
rep(1:2,5) %in% 1:2
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#Note this output is longer in second
== is logical operator meant to compare if two things are exactly equal. If the vectors are of equal length, elements will be compared element-wise. If not, vectors will be recycled. The length of output will be equal to the length of the longer vector.
1:2 == rep(1:2,5)
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
rep(1:2,5) == 1:2
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
1:10 %in% 3:7
#[1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE
#is same as
sapply(1:10, function(a) any(a == 3:7))
#[1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE
NOTE: If possible, try to use identical or all.equal instead of == and.
Given two vectors, x and y, the code x == y will compare the first element of x with the first element of y, then the second element of x with the second element of y, and so on. When using x == y, the lengths of x and y must be the same. Here, compare means "is equal to" and therefore the output is a logical vector equal to the length of x (or y).
In the code x %in% y, the first element of x is compared to all elements in y, then the second element of x is compared to all elements of y, and so on. Here, compare means "is the current element of x equal to any value in y" and therefore the output is a logical vector that has the same length of x and not (necessarily) y.
Here is a code snippet illustrating the difference. Note that x and y have the same lengths but the elements of y are the elements of x in different order. Note too in the final examples that x is a 3-element vector being compared to the letters vector, which contains 26 elements.
> x <- c('a','b','c')
> y <- c('c', 'b', 'a')
> x == y
[1] FALSE TRUE FALSE
> x %in% y
[1] TRUE TRUE TRUE
> x %in% letters
[1] TRUE TRUE TRUE
> letters %in% x
[1] TRUE TRUE TRUE FALSE FALSE FALSE
[7] FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE
[19] FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE
Try it for objects of different length.
ac <- c("a", "b", "c")
ae <- c("a", "b", "c", "d", "e")
ac %in% ae
[1] TRUE TRUE TRUE
ac == ae
[1] TRUE TRUE TRUE FALSE FALSE
It's becomes clear that %in% checks whether or not the object is contained in the other object. Whereas == is a logical operator that checks for identity properties.
== cheks if elements of a vector is equal to all elements of another vector. Ideally two vectors will have the same size (or it will have unexpected results as when sizes don't match R recycles the shorter vector, silently if sizes are multiples of each other). For instance
c(1,2,3) == c(1,3,2)
[1] TRUE FALSE FALSE
or
c(1,2) == c(1,3,2)
[1] TRUE FALSE FALSE
Warning message:
In c(1, 2) == c(1, 3, 2) :
longer object length is not a multiple of shorter object length
%in% on the other hand checks which elements of list 1 is included in list 2
c(1,2,3) %in% c(1,3,2)
[1] TRUE TRUE TRUE
or
c(1,2) %in% c(1,3,2)
[1] TRUE TRUE
Related
I came across a question like this: "retrieve all values less than or equal to 5 from a vector of sequence 1 through 9 having a length of 9". Now based on my knowledge so far, I did trial & error, then I finally executed the following code:
vec <- c(1:9) ## assigns to vec
lessThanOrEqualTo5 <- vec[vec <= 5]
lessThanOrEqualTo5
[1] 1 2 3 4 5
I know that the code vec <= 5 would return the following logical
[1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
So my question is, how does R use these logical to return the appropriate values satisfying the condition since the code would end up having a structure like this vec[TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE]?
This question already has answers here:
What is the difference between `%in%` and `==`?
(3 answers)
Closed 5 years ago.
My question concerns the practical difference between the == and %in% operators in R.
I have run into an instance at work where filtering with either operator gives different results (e.g. one results on 800 rows, and the other 1200). I have run into this problem in the past and am able to validate in a way that ensures I get the results I desire. However, I am still stumped regarding how they are different.
Can someone please shed some light on how these operators are different?
%in% is value matching and "returns a vector of the positions of (first) matches of its first argument in its second" (See help('%in%')) This means you could compare vectors of different lengths to see if elements of one vector match at least one element in another. The length of output will be equal to the length of the vector being compared (the first one).
1:2 %in% rep(1:2,5)
#[1] TRUE TRUE
rep(1:2,5) %in% 1:2
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#Note this output is longer in second
== is logical operator meant to compare if two things are exactly equal. If the vectors are of equal length, elements will be compared element-wise. If not, vectors will be recycled. The length of output will be equal to the length of the longer vector.
1:2 == rep(1:2,5)
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
rep(1:2,5) == 1:2
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
1:10 %in% 3:7
#[1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE
#is same as
sapply(1:10, function(a) any(a == 3:7))
#[1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE
NOTE: If possible, try to use identical or all.equal instead of == and.
Given two vectors, x and y, the code x == y will compare the first element of x with the first element of y, then the second element of x with the second element of y, and so on. When using x == y, the lengths of x and y must be the same. Here, compare means "is equal to" and therefore the output is a logical vector equal to the length of x (or y).
In the code x %in% y, the first element of x is compared to all elements in y, then the second element of x is compared to all elements of y, and so on. Here, compare means "is the current element of x equal to any value in y" and therefore the output is a logical vector that has the same length of x and not (necessarily) y.
Here is a code snippet illustrating the difference. Note that x and y have the same lengths but the elements of y are the elements of x in different order. Note too in the final examples that x is a 3-element vector being compared to the letters vector, which contains 26 elements.
> x <- c('a','b','c')
> y <- c('c', 'b', 'a')
> x == y
[1] FALSE TRUE FALSE
> x %in% y
[1] TRUE TRUE TRUE
> x %in% letters
[1] TRUE TRUE TRUE
> letters %in% x
[1] TRUE TRUE TRUE FALSE FALSE FALSE
[7] FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE
[19] FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE
Try it for objects of different length.
ac <- c("a", "b", "c")
ae <- c("a", "b", "c", "d", "e")
ac %in% ae
[1] TRUE TRUE TRUE
ac == ae
[1] TRUE TRUE TRUE FALSE FALSE
It's becomes clear that %in% checks whether or not the object is contained in the other object. Whereas == is a logical operator that checks for identity properties.
== cheks if elements of a vector is equal to all elements of another vector. Ideally two vectors will have the same size (or it will have unexpected results as when sizes don't match R recycles the shorter vector, silently if sizes are multiples of each other). For instance
c(1,2,3) == c(1,3,2)
[1] TRUE FALSE FALSE
or
c(1,2) == c(1,3,2)
[1] TRUE FALSE FALSE
Warning message:
In c(1, 2) == c(1, 3, 2) :
longer object length is not a multiple of shorter object length
%in% on the other hand checks which elements of list 1 is included in list 2
c(1,2,3) %in% c(1,3,2)
[1] TRUE TRUE TRUE
or
c(1,2) %in% c(1,3,2)
[1] TRUE TRUE
I am trying to change the logical values (elements) of my list based on another list. Basically, where both lists are "TRUE", I want to change the value in the main list to "FALSE". Both lists are lengths of 5. For example
List_A <- list(c(TRUE,FALSE,TRUE),c(FALSE,TRUE,TRUE),c(FALSE,FALSE,FALSE),c(TRUE,TRUE,TRUE),c(TRUE,FALSE,TRUE))
List_B <-list(c(FALSE,FALSE,FALSE),c(TRUE,TRUE,FALSE),c(TRUE,TRUE,TRUE),c(FALSE,FALSE,TRUE),c(FALSE,TRUE,TRUE))
List B has sequences as name attributes.
Desired output:
Output <-
list(c(TRUE,FALSE,TRUE),c(FALSE,FALSE,TRUE),c(FALSE,FALSE,FALSE),c(TRUE,TRUE,FALSE),c(TRUE,FALSE,FALSE))
In other words, elements in listA remain the same unless they have matching TRUE values in both lists, which replaces them to FALSE.
I've tried running the for loop below but it doesn't work and I don't know how I would redirect the output, if it did.
for(i in 1:length(List_A)) { List_A[[i]][List_B[[i]]] <- FALSE }
You can take help of Map function.
If both the values are TRUE turn to FALSE or keep value from List_A.
Output <- Map(function(x, y) replace(x, x & y, FALSE), List_A, List_B)
Output
#[[1]]
#[1] TRUE FALSE TRUE
#[[2]]
#[1] FALSE FALSE TRUE
#[[3]]
#[1] FALSE FALSE FALSE
#[[4]]
#[1] TRUE TRUE FALSE
#[[5]]
#[1] TRUE FALSE FALSE
data
List_A <- list(c(TRUE,FALSE,TRUE),c(FALSE,TRUE,TRUE),c(FALSE,FALSE,FALSE),c(TRUE,TRUE,TRUE),c(TRUE,FALSE,TRUE))
List_B <- list(c(FALSE,FALSE,FALSE),c(TRUE,TRUE,FALSE),c(TRUE,TRUE,TRUE),c(FALSE,FALSE,TRUE),c(FALSE,TRUE,TRUE))
We can use map2
library(purrr)
map2(List_A, List_B, ~ !(.x & .y))
data
List_A <- list(c(TRUE,FALSE,TRUE),c(FALSE,TRUE,TRUE),c(FALSE,FALSE,FALSE),c(TRUE,TRUE,TRUE),c(TRUE,FALSE,TRUE))
List_B <- list(c(FALSE,FALSE,FALSE),c(TRUE,TRUE,FALSE),c(TRUE,TRUE,TRUE),c(FALSE,FALSE,TRUE),c(FALSE,TRUE,TRUE))
I try to subset values in R depending on values in column y like shown in the following:
I have the data set "data" which is like this:
data <- data.frame(y = c(0,0,2000,1500,20,77,88),
a = "bla", b = "bla")
And would end up with this:
I have this R code:
data <- arrange(subset(data, y != 0 & y < 1000 & y !=77 & [...]), desc(y))
print(head(data, n =100))
Which works.
However I would like to collect the values to exclude in a list as:
[0, 1000, 77]
And somehow loop through this, with the lowest possible running time instead of hardcoding them directly in the formula. Any ideas?
The list, should only contain "!=" operations:
[0, 77]
and the "<" should be remain in the formula or in another list.
I'm going to answer your original question because it's more interesting. I hope you won't mind.
Imagine you had values and operators to apply to your data:
my.operators <- c("!=","<","!=")
my.values <- c(0,1000,77)
You can use Map from base R to apply a function to two vectors. Here I'll use get so we can obtain the actual operator given by the character string.
Map(function(x,y)get(y)(data$y,x),my.values,my.operators)
[[1]]
[1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE
[[2]]
[1] TRUE TRUE FALSE FALSE TRUE TRUE TRUE
[[3]]
[1] TRUE TRUE TRUE TRUE TRUE FALSE TRUE
As you can see, we get a list of logical vectors for each value, operator pair.
To better understand what's going on here, consider only the first value of each vector:
get("!=")(data$y,0)
[1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE
Now we can use Reduce:
Reduce(`&`,lapply(my.values,function(x) data$y!=x))
[1] FALSE FALSE TRUE TRUE TRUE FALSE TRUE
And finally subset the data:
data[Reduce("&",Map(function(x,y)get(y)(data$y,x),my.values,my.operators)),]
y a b
5 20 bla bla
7 88 bla bla
1)I need to intersect two vectors and return a vector with the same and with the intersected values.
intersect() does not return a vector with the same size.
2) Also why does this return c(TRUE TRUE TRUE) and not c(FALSE TRUE TRUE) ?
set1 = c(TRUE,FALSE,TRUE)
set2 = c(FALSE,FALSE,TRUE)
testset = set1 %in% set2
> print(testset)
[1] TRUE TRUE TRUE
I got as result TRUE TRUE TRUE and I need FALSE FALSE TRUE.
To do the intersection, you need to use the & operator, as follows:
testset = set1 & set2
This will give you the following result: FALSE FALSE TRUE
Hope it helps.
A %in% B checks for every element in A if that element is in B. The results always has the same length as length(A). Try e.g.
1:3 %in% 1:9
1:9 %in% 1:3
I think what you want is this:
set1 == set2
[1] FALSE TRUE TRUE