Number of overlapping elements - r

I've got two vectors:
vec1 <- c(1,0,1,1,1)
vec2 <- c(1,1,0,1,1)
The vectors have the same elements at position 1, 4 and 5.
How can I return the number of elements that overlap in 2 vectors taking the position into account? So, here I would like to return the number 3.

Test for equality, then sum, you might want to exclude NAs:
sum(vec1==vec2, na.rm=TRUE)
EDIT
Exclude 0==0 matches, by adding an exclusion like:
sum(vec1==vec2 & vec1!=0, na.rm=TRUE)
Thanks to #CarlWitthoft
Or, if you have only ones and zeros, then:
sum((vec1+vec2)==2, na.rm=TRUE)

If your entries are only 0 and 1 (or if you are only interested in 0 and anything that is not 0) you can use xor to determine where they differ and then sum its negation, otherwise you would have to test for equality as #zx8754 commented:
sum(!xor(vec1,vec2))
[1] 3

Related

Count minimum values in a vector, where the minumum values are in consecutive order

As the title says, and it's probably very easy, but how can I count the number of minimum values in a vector, or more specific in a subset of vector:
Down below is an example:
a <- c(1,1,1,2,2)
so i want the output to be equal to 3 (since there are three 1's)
You can use == to get a logical vector, sum() then counts number of TRUE in a logical vector.
sum(a == min(a))
# [1] 3
You can use table, i.e.
table(a)[1]
#1
#3
or If you want to unname it,
unname(table(a)[1])
#[1] 3
We can use tabulate
tabulate(a)[1]
#[1] 3

Get indices of two values that bracket zero in R

I have a vector x:
x <- c(-1,-0.5,-0.1,-0.001,0.5,0.6,0.9)
I want the index of the closest negative value to zero and the closest positive value to zero. In this case, 4 and 5. x is not necessarily sorted.
I can do this by setting numbers to NA:
# negative numbers only
tmp <- x
tmp[x > 0] <- NA
which.max(tmp)
# positive numbers only
tmp <- x
tmp[x < 0] <- NA
which.min(tmp)
But that seems clunky. Any tips?
good scenario
If you are in the classic case, where
your vector is sorted in increasing order,
it does not include 0,
it has no tied values,
you can simply do the following:
findInterval(0, x, TRUE) + 0:1
If condition 1 does not hold, but condition 2 and 3 still hold, you can do
sig <- order(x)
sig[findInterval(0, x[sig], TRUE) + 0:1]
akrun's answer is fundamentally the same.
bad scenario
Things become tricky once your vector x contains 0 or tied / repeated values, because:
repeated values challenge sorting based method, as sorting method like "quick sort" is not stable (see What is stability in sorting algorithms and why is it important? if you don't know what a stable sort is);
findInterval will locate exactly 0 at presence of 0.
In this situation, you have to adapt Ronak Shah's answer which allows you to exclude 0. But be aware that which may give you multiple indexes if there are repeated values.
Another way could be:
#closest positive value to zero.
which(x == min(x[x > 0]))
#[1] 5
#closest negative value to zero
which(x == max(x[x < 0]))
#[1] 4
We could try
rle(sign(x))$lengths[1] + 0:1
#[1] 4 5
if it is unsorted, then
x1 <- sort(x)
match(x1[rle(sign(x1))$lengths[1] + 0:1], x)

How to find if two or more continuously elements of a vector are equal in R

I want to find a way to determine if two or more continuously elements of a vector are equal.
For example, in vector x=c(1,1,1,2,3,1,3), the first, the second and the third element are equal.
With the following command, I can determine if a vector, say y, contains two or more continuously elements that are equal to 2 or 3
all(rle(y)$lengths[which( rle(y)$values==2 | rle(y)$values==3 )]==1)
Is there any other faster way?
EDIT
Let say we have the vector z=c(1,1,2,1,2,2,3,2,3,3).
I want a vector with three elements as output. The first element will refer to value 1, the second to 2 and the third one to 3. The values of the elements of the output vector will be equal to 1 if two or more continuously elements of z are the same for one value of 1,2,3 and 0 otherwise. So, the output for the vector z will be (1,1,1).
For the vector w=c(1,1,2,3,2,3,1) the output will be 1,0,0, since only for the value 1 there are two continuously elements, that is in the first and in the second position of w.
I'm not entirely sure if I'm understanding your question as it could be worded better. The first part just asks how you find if continuous elements in a vector are equal. The answer is to use the diff() function combined with a check for a difference of zero:
z <- c(1,1,2,1,2,2,3,2,3,3)
sort(unique(z[which(diff(z) == 0)]))
# [1] 1 2 3
w <- c(1,1,2,3,2,3,1)
sort(unique(w[which(diff(w) == 0)]))
# [1] 1
But your edit example seems to imply you are looking to see if there are repeated units in a vector, of which will only be the integers 1, 2, or 3. Your output will always be X, Y, Z, where
X is 1 if there is at least one "1" repeated, else 0
Y is 2 if there is at least one "2" repeated, else 0
Z is 3 if there is at least one "3" repeated, else 0
Is this correct?
If so, see the following
continuously <- function(x){
s <- sort(unique(x[which(diff(x) == 0)]))
output <- c(0,0,0)
output[s] <- s
return(output)
}
continuously(z)
# [1] 1 2 3
continuously(w)
# [1] 1 0 0
Assuming your series name is z=c(1,1,2,1,2,2,3,2,3,3) then you can do:
(unique(z[c(FALSE, diff(z) == 0)]) >= 0)+0 which will output to 1, 1, 1,
When you run the above command on your other sequenc:
w=c(1,1,2,3,2,3,1)
then (unique(w[c(FALSE, diff(w) == 0)]) >= 0)+0 will return to 1
You may also try this for an exact output like 1,1,1 or 1,0,0
(unique(z[c(FALSE, diff(z) == 0)]) == unique(z))+0 #1,1,1 for z and 1,0,0 for w
Logic:
diff command will take difference between corresponding second and prior items, since total differences will always 1 less than the number of items, I have added first item as FALSE. Then subsetted with your original sequences and for boolean comparison whether the difference returned is zero or not. Finally we convert them to 1s by asking if they are greater than or equal to 0 (To get series of 1s, you may also check it with some other conditions to get 1s).
Assuming your sequence doesn't have negative numbers.

How do I count the number of pattern occurrences, if the pattern includes NA, in R?

I have a string of 0's, 1's and NA's like so:
string<-c(0,1,1,0,1,1,NA,1,1,0,1,1,NA,1,0,
0,1,0,1,1,1,NA,1,0,1,NA,1,NA,1,0,1,0,NA,1)
I'd like to count the number of times the PATTERN "1-NA-1" occurs. In this instance, I would like get the count 5.
I've tried table(string), and trying to replicate this but nothing seems to work. I would appreciate anyone's help!
# some ugly code, but it seems to work
sum( head(string, -2) == 1 & is.na(head(string[-1],-1))
& string[-1:-2] == 1, na.rm = TRUE)
Something like:
x <- which(is.na(string))
x <- x[!x %in% c(1,length(string))]
length(x[string[x-1] & string[x+1]])
# [1] 5
-- REASONING --
First, we check which values of string are NA with is.na(string). Then we find those indices with which and store them in x.
As #Rick mentions, if the first/last value is NA it would lead to problems in our next step. So, we make sure that those are removed (as it shouldn't count anyway).
Next, we want to find the situation where both string[x-1] and string[x+1] are 1. In other words, 1 & 1. Note that FALSE and TRUE can be evaluated as 0 and 1 respectively. So, if you type 1 == TRUE you will get TRUE. If you type 1 & 1 you will also get TRUE back. So, string[x-1] & string[x+1] will return TRUE when both are 1, and FALSE otherwise. We basically obtain a logical vector, and subset x with that vector to get all positions in x that satisfy our search. Then we use length to determine how many there are.

An elegant way to count number of negative elements in a vector?

I have a data vector with 1024 values and need to count the number of negative entries. Is there an elegant way to do this without looping and checking if an element is <0 and incrementing a counter?
You want to read 'An Introduction to R'. Your answer here is simply
sum( x < 0 )
which works thanks to vectorisation. The x < 0 expression returns a vector of booleans over which sum() can operate (by converting the booleans to standard 0/1 values).
There is a good answer to this question from Steve Lianoglou How to identify the rows in my dataframe with a negative value in any column?
Let me just replicate his code with one small addition (4th point).
Imagine you had a data.frame like this:
df <- data.frame(a = 1:10, b = c(1:3,-4, 5:10), c = c(-1, 2:10))
This will return you a boolean vector of which rows have negative values:
has.neg <- apply(df, 1, function(row) any(row < 0))
Here are the indexes for negative numbers:
which(has.neg)
Here is a count of elements with negative numbers:
length(which(has.neg))
The above solutions prescribed need to be tweaked in-order to apply this for a df.
The below command helps get the count of negative or any other symbolic logical relationship.
Suppose you have a dataframe:
df <- data.frame(x=c(2,5,-10,NA,7), y=c(81,-1001,-1,NA,-991))
In-order to get count of negative records in x:
nrow(df[df$x<0,])

Resources