Counting number of Boolean switches in R - r

With the array:
my_array <- c(F,T,T,F,F,T,T,T,F,T,F)
I need a script that will tell me how many times the value went from False to True. Just by eye it's easy to see it did that 3 times. I'm only interested on it switching from False to True and NOT from True to False.

Since you care only about the times when it went from FALSE to TRUE, this is the number of times the diff of the vector is equal to 1:
sum(diff(my_array) == 1)
# [1] 3
This is in my opinion the most direct way to address your question, but note that R also has the excellent rle function that returns the run-length encoding of your vector, namely the length of each section of consecutive values within the vector. You could use rle to address your particular query by counting the number of runs (excluding the last) that take the FALSE value:
sum(head(rle(my_array)$values, -1) == FALSE)
# [1] 3
Note that both of these solutions took advantage of the fact that this is a vector with only TRUE/FALSE values. A general approach to count the number of transitions from some value A to some value B is to compare head(vector, -1) with tail(vector, -1) -- namely all but the last element of the vector against all but the first. In your case:
sum(head(my_array, -1) == FALSE & tail(my_array, -1) == TRUE)
# [1] 3
The first element of head(my_array, -1) == FALSE indicates whether the first element of my_array is FALSE, the second element is whether the second element is FALSE, and so on. Meanwhile, the first element of tail(my_array, -1) == TRUE indicates whether the second element of my_array is TRUE, the second element indicates whether the third element is TRUE, and so on. Therefore, the corresponding elements of head(my_array, -1) and tail(my_array, -1) are one apart and enable us to check conditions on pairs of elements.

Related

Subsetting a vector with a condition (excluding NA)

vector1 = c(1,2,3,NA)
condition1 = (vector1 == 2)
vector1[condition1]
vector1[condition1==TRUE]
In the above code, the condition1 is "FALSE TRUE FALSE NA",
and the 3rd and the 4th lines both gives me the result "2 NA"
which is not I expected.
I wanted elements whose values are really '2', not including NA.
Could anybody explain why R is designed to work in this way?
and how I can get the result I want with a simple command?
The subset vector[NA] will always be NA because the NA value is unknown and therefore the result of the subset is also unknown. %in% returns FALSE for NA, so it can be useful here.
vector1 = c(1,2,3,NA)
condition1 = (vector1 %in% 2)
vector1[condition1]
# [1] 2
If you are in RStudio and enter
?`[`
You will get the following explanation:
NAs in indexing
When extracting, a numerical, logical or character NA index picks an
unknown element and so returns NA in the corresponding element of a
logical, integer, numeric, complex or character result, and NULL for a
list. (It returns 00 for a raw result.)
When replacing (that is using indexing on the lhs of an assignment) NA
does not select any element to be replaced. As there is ambiguity as
to whether an element of the rhs should be used or not, this is only
allowed if the rhs value is of length one (so the two interpretations
would have the same outcome). (The documented behaviour of S was that
an NA replacement index ‘goes nowhere’ but uses up an element of
value: Becker et al p. 359. However, that has not been true of other
implementations.)
try the logical operator in that case,
vector1 = c(1,2,3,NA)
condition1<-(vector1==2 & !is.na(vector1) )
condition1
# FALSE TRUE FALSE FALSE
vector1[condition1]
# 2
& operation returns true when both of the logical operators are True.
identical is "The safe and reliable way to test two objects for being exactly equal. It returns TRUE in this case, FALSE in every other case." (see ?identical)
As it does not compare elementwise comparison you can use it in sapply to compare each element in vector1 to 2. I.e.:
condition1 = sapply(vector1, identical, y = 2)
which will give:
vector1[condition1]
[1] 2

How to count number of TRUE values in a logical vector before FALSE

While trying to find number of TRUE values in a vector, I came across the first Google hit. However, this does not fully meet my requirements. I am interested to find the number of TRUE values in a vector before the first FALSE if any. I have a vector a <- c(TRUE,TRUE,TRUE,FALSE,TRUE, TRUE) and want to find all TRUE values before the FALSE, so output will be three. Kindly note that it should also work if there are only TRUE values in the vector.
Here is a short way:
sum(cumprod(a))
# [1] 3
where cumprod gives a cumulative product (of zeros and ones in this case); so, it eliminates all TRUE's after the first FALSE, as in
cumprod(a)
# [1] 1 1 1 0 0 0
Using the below statement we can get the result easily.
which.min(a)-1

How do I count the number of pattern occurrences, if the pattern includes NA, in R?

I have a string of 0's, 1's and NA's like so:
string<-c(0,1,1,0,1,1,NA,1,1,0,1,1,NA,1,0,
0,1,0,1,1,1,NA,1,0,1,NA,1,NA,1,0,1,0,NA,1)
I'd like to count the number of times the PATTERN "1-NA-1" occurs. In this instance, I would like get the count 5.
I've tried table(string), and trying to replicate this but nothing seems to work. I would appreciate anyone's help!
# some ugly code, but it seems to work
sum( head(string, -2) == 1 & is.na(head(string[-1],-1))
& string[-1:-2] == 1, na.rm = TRUE)
Something like:
x <- which(is.na(string))
x <- x[!x %in% c(1,length(string))]
length(x[string[x-1] & string[x+1]])
# [1] 5
-- REASONING --
First, we check which values of string are NA with is.na(string). Then we find those indices with which and store them in x.
As #Rick mentions, if the first/last value is NA it would lead to problems in our next step. So, we make sure that those are removed (as it shouldn't count anyway).
Next, we want to find the situation where both string[x-1] and string[x+1] are 1. In other words, 1 & 1. Note that FALSE and TRUE can be evaluated as 0 and 1 respectively. So, if you type 1 == TRUE you will get TRUE. If you type 1 & 1 you will also get TRUE back. So, string[x-1] & string[x+1] will return TRUE when both are 1, and FALSE otherwise. We basically obtain a logical vector, and subset x with that vector to get all positions in x that satisfy our search. Then we use length to determine how many there are.

Types and comparisons in R

I've been working with R for a month or so, and my comprehension of some subtleties is still quite superficial.
I have had an issue, which I managed to solve (details below), but I still can't explain precisely why it did not work with the first solution.
Note that the example below makes no practical sense for I have simplified it as much as possible so that the problem is quite clear.
ISSUE :
Given a data frame with 4 columns (email, first, last, company) :
> users <- data.frame(matrix(vector(), 0, 4, dimnames=list(c(), c("email", "first", "last", "company"))), stringsAsFactors=F)
> users[1,] <- c("robert#redford.com", "Robert", "Redford", "Paramount")
> users[2,] <- c("julia#roberts.com", "Erin", "B.", "Hinkley")
> users[3,] <- c("matt#damon.com", "Will", "H.", "Stanford")
> users[4,] <- c("john#malkovitch.com", "John", "M.", "JM")
I take one particular row :
> user <- users[3,]
When I try to subset the dataframe on a criteria which could have lead to return the previously mentioned row, it returns no result.
> users[users$email == user["email"],]
[1] email first last company
<0 lignes> (ou 'row.names' de longueur nulle)
I instantly thought it was a casting issue (sorry for this bad one)
> users[users$email == as.character(user["email"]),]
email first last company
3 matt#damon.com Will H. Stanford
However, when I tried to figure out where exactly the issue was, and tried this :
> users[users$email == "matt#damon.com",]
email first last company
3 matt#damon.com Will H. Stanford
> user["email"] == "matt#damon.com"
email
3 TRUE
> users[3,]$email == user$email
[1] TRUE
I got quite confused :
First, I thought about it as a math problem : if A == B and B == C, then A == C (according to Captain Obvious). So, just replacing a member A by another member B which is supposed to be equal to A (given the "TRUE" statement) in some expression should have no impact on the result of this expression.
3 TRUE != [1] TRUE. I think [1] TRUE is a logical vector of size 1 which first element is TRUE. 3 TRUE is (1x1) matrix row, which column "email" value is TRUE.
My problem is with consistency : either two objects of equal content but different types should be equal, or they should be different. I have a problem with "Sometimes there is type inference, and sometimes not". Is there a rule I can't see beyond this behavior ? (I guess there is one)
Another expression of the behavior I'd like to get is this one :
> unique(users$email) == "matt#damon.com"
[1] FALSE FALSE TRUE FALSE
> unique(users$email) == user["email"]
email
3 FALSE
Obviously R does get what I want (considering the fact that it gives me the matching row). But I can't explain (nor use) the result of the second statement.
Any explanations / thoughts?
in normal list situations
users$email == user[["email"]]
however in data.frames things get inconsistent/ a lot worse!
tdf=data.frame(matrix(1:100,10,10))
tdf[] # returns data.frame everything
tdf[1] # returns data.frame first column
tdf[1,1] # returns object as type of the object...
tdf[,1] # returns a vector of the first column
tdf[1,] # returns a data.frame of the first row # eeeeeugh... that is odd....
tdf[2:4] # returns a data.frame with 3 columns
tdf[1,2:4] # returns a data.frame of the first row of 3 colums
tdf[2:4,2:4] # returns a 3x3 data.frame
tdf[2:4,1] # returns a vector of 2:4 row and 1st column
tdf[,2:4] # returns a data.frame with 3 columns
then there is also the double [[]]
do note that in data.frames things get horribly annoying and fugly
tdf[[1]] # gives the first row as a vector
tdf[[1,1]] # gives first element
and pretty much all other combinations gives errors
and assigning stuff to a data.frame or matrix, is an even bigger mess!

Couldn't reduce the looping variable inside the "for" loop in R

I have a for loop to do a matrix manipulation in R. For some checks are true i need to come to the same row again., means i need to be reduced by 1.
for(i in 1:10)
{
if(some chk)
{
i=i-1
}
}
Actually i is not reduced for me. For an example in 5th row i'm reducing the i to 4, so again it should come as 5, but it is coming as 6.
Please advice.
My intention is:
Checking the first column values of a matrix, if I find any duplicate value, I take the second column value and append with the first row's second column and remove the duplicate row. So, when I'm removing a row I do not need increase the i in while loop. (This is just a map reduce method, append values of same key)
Variables in R for loops are read-only, you cannot modify them. What you have written would be solved completely differently in normal R code – the exact solution depending on the actual problem, there isn’t a generic, direct replacement (except by replacing the whole thing with a while loop but this is both ugly and probably unnecessary).
To illustrate this, consider these two typical examples.
Assume you want to filter all duplicated elements from a list. Instead of looping over the list and copying all duplicated elements, you can use the duplicated function which tells you, for each element, whether it’s a duplicate.
Secondly, you use standard R subsetting syntax to select just those elements which are not a duplicate:
x = x[! duplicated(x)]
(This example works on a one-dimensional vector or list, but it can be generalised to more dimensions.)
For a more complex case, let’s say that you have a vector of numbers and, for every even number in the vector, you want to double the preceding number (this is highly artificial but in signal processing you might face similar problems). In other words:
input = c(1, 3, 2, 5, 6, 7, 1, 8)
output = ???
output
# [1] 1 6 2 10 6 7 2 8
… we want to fill in ???. In the first step, we check which numbers are even:
even = input %% 2 == 0
# [1] FALSE FALSE TRUE FALSE TRUE FALSE FALSE TRUE
Next, we shift the result down – because we want to know whether the next number is even – by removing the first element, and appending a dummy element (FALSE) at the end.
even = c(even[-1], FALSE)
# [1] FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE
And now we can multiply just these inputs by two:
output = input
output[even] = output[even] * 2
There, done.

Resources