Compare elements in 2 vectors in R [duplicate] - r

This question already has answers here:
How to find common elements from multiple vectors?
(3 answers)
Closed 7 years ago.
I have a question regarding compare elements in 2 vectors. For example, I have 2 vectors
a<-c(8, 28, 23, 21, 7, 3, 24, 6, 1, 4)
b<-c(28, 27, 8, 7, 6, 23, 21, 3, 1, 26)
Now I want to answer the question "How many elements in a are the same as element in b?"
Which mean that I have 1, 3, 6, 7, 8, 21, 23, 28 are common numbers --> 8 elements in common.
Do we have any function in R help me to answer this question? Thank you in advance.

You can try intersect function
> intersect(a, b)
[1] 8 28 23 21 7 3 6 1
Edit: to get the count use length function
> length(intersect(a, b))
[1] 8

Related

How to get the positions of the k greatest(or smallest) elements of a vector? [duplicate]

This question already has answers here:
How to find the largest N elements in a list in R?
(4 answers)
Closed 10 months ago.
I have a vector and I want to find the indices of the k greatest elements, not the elements themselves which I could do with sort. One idea would be to add indices to the values and have a custom sort function that only compares the first elements of pairs (a classical solution to this problem) but surely there has to be a simpler way ? Note that performance isn`t a matter.
First I create a random vector:
vector <- c(1, 3, 6, 2, 7, 8, 10, 4)
Next, you can use the following code which will output the top k elements as x with index ix:
k <- 3
lst <- sort(vector, index.return=TRUE, decreasing=TRUE)
lapply(lst, `[`, lst$x %in% head(unique(lst$x),k))
Output:
$x
[1] 10 8 7
$ix
[1] 7 6 5
As you can see ix gives the index of the top k elements.
Using rank.
x <- c(1, 3, 6, 2, 7, 8, 10, 4)
seq_along(x)[rank(-x) < 4]
# [1] 5 6 7
If you have ties, the result is this:
x <- c(10, 3, 6, 2, 7, 8, 10, 4)
seq_along(x)[rank(-x) < 4]
# [1] 1 6 7

R: find value/index which is not in rank

Imaging a vector which should have ordered values, but one or several values are not in order. How can I find the index of the values that break ranks?
I tried to use diff()but I couldn't find a way that it works for all different cases. I'm somehow lost with this. Thanks for any help.
E.g. a breking rank element at index 5
t <- c(4, 6, 10, 30, 15, 20, 31) # how to find the index of the value 30?
which(diff(t)<0)
> 4
this works but, what if the element breaking rank is 2 instead of 30?
t <- c(4, 6, 10, 2, 15, 20, 31) # how to find the index of the value 2?
which(diff(t)<0)
> 3
or it is the last element
t <- c(4, 6, 10, 12, 15, 20, 9) # how to find the index of the value 9?
which(diff(t)<0)
> 6

Sort, by the difference of two numbers

I want to sort an array in increasing way, by the difference of the biggest and smallest number.
Without loops.
I think I need a sort that I can give a condition but i can't find how.
Something like this:
sort(arr, decreasing = FALSE, by = max(a) - min(a))
sort(arr, decreasing = FALSE, condition = max(a) - min(a))
The sorted array have to look like this. The difference from the first and second number is the smallest for all numbers int the array, the difference from the second and the third is the second smallest ......
Example: // I thing is like this
array(22, 2, 32, 3, 6, 9, 7, 23, 11, 13)
sorted_array(9, 11, 7, 13, 6, 22, 3, 23, 2, 32)
I thing another way is to construct the sorted array be putting on the last position the biggest number after that the smallest, the second biggest, the second smallest, ...
Sorry for the bad explanation.
This is a idea how it could work, but only for arrays where the length is even. If you want to use this solution and you have uneven arrays, you can work with if. I need to admit that it have to be urgent, that I would like to use a construction like this instead of a loop.
x <- c(22, 2, 32, 3, 6, 9, 7, 23, 11, 13)
n <- length(x)
m <- floor(n/2)
rev(
as.numeric(
rbind(
sort(x)[n-c(0:(m-1))],
sort(x)[1:m]
)
)
)
I attempted to come up with a non-for-loop construction. So I first sorted the sequence and then split it in two halves by naming the first half "a_N" and second half "b_N", then "folded it " into a two column matrix with the first half reversed, and finally read it out by unfolding with c:
my_arr <- c(22, 2, 32, 3, 6, 9, 7, 23, 11, 13)
names(my_arr) <- paste0( rep( c("a","b"), each=length(my_arr)/2), order(my_arr) )
c( rbind( sort( my_arr[grep("a", names(my_arr))], decreasing=TRUE), #first half
my_arr[grep("b", names(my_arr))]) ) # second half
#[1] 9 11 7 13 6 22 3 23 2 32
You can see the intermediate value of the matrix:
rbind( sort( my_arr[grep("a", names(my_arr))], decreasing=TRUE), my_arr[grep("b", names(my_arr))])
a5 a4 a3 a2 a1
[1,] 9 7 6 3 2
[2,] 11 13 22 23 32
And since R matrices are read out in column order you get the desired interleaving with c() which also removes the names.

Combining vector indexes and queries

I want to select the first 5 elements of a vector and those that are greater that a certain threshold. For example:
v = c(10, 11, 2, 8, 5, 2, 10)
v[1:5] # return the first 5 elements
v[which(v>5)] # returns all elements > 5
How do I combine the two queries to return 10, 11, 2, 8, 5, 10? That is the first 5 elements, plus 10 because greater than 5.
We could use union
union(v[1:5], v[which(v>5)])
Or as commented by #Vlo (in case there are duplicate values)
v[union(1:5, which(v>5))]

Finding groups of contiguous numbers in a list [duplicate]

This question already has answers here:
Sequence length encoding using R
(6 answers)
Closed 9 years ago.
This is a duplicate question to this, except for R rather than Python.
I'd like to identify groups of contiguous (some people call them continuous) integers in a list, where duplicate entries are treated as existing within the same range. Therefore:
myfunc(c(2, 3, 4, 4, 5, 12, 13, 14, 15, 16, 17, 17, 20))
returns:
min max
2 5
12 17
20 20
Although any output format would be fine. My current brute-force, for-loop method is pretty slow.
(Apologies if I could have easily re-interpreted the Python answer and I'm being stupid!)
Just use diff:
x = c(2, 3, 4, 4, 5, 12, 13, 14, 15, 16, 17, 17, 20)
start = c(1, which(diff(x) != 1 & diff(x) != 0) + 1)
end = c(start - 1, length(x))
x[start]
# 2 12 20
x[end]
# 5 17 20

Resources