Split logical vector based on FALSE/TRUE patterns - r

Given logical vector x:
x <- c(FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, FALSE,
FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, TRUE)
How to split x based on every FALSE/TRUE patterns? Of course, we can simply do the split based on TRUE/FALSE patterns using !x.
So the split would search for the patterns FALSE, FALSE, ..., FALSE , TRUE, TRUE, ..., TRUE until we reach again a FALSE. At which point, we stop. Said differently, we do the split every time we move from a TRUE to a FALSE.
Here is what I ended up with:
p <- which(diff(x)==-1)+1
split(x, cumsum(seq_along(x) %in% p))
So the output is rightly:
# $`0`
# [1] FALSE FALSE FALSE TRUE TRUE
# $`1`
# [1] FALSE FALSE TRUE
# $`2`
# [1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE
# $`3`
# [1] FALSE TRUE
Any other solution to this problem? More efficient way to do this?

Related

Find the *first* longest sequence of TRUE in a boolean vector

I need to find the first longest sequence of TRUE in a boolean vector. Some examples:
bool <- c(FALSE, TRUE, FALSE, TRUE)
# should become
c(FALSE, TRUE, FALSE, FALSE)
bool <- c(FALSE, TRUE, FALSE, TRUE, TRUE)
# should become
c(FALSE, FALSE, FALSE, TRUE, TRUE)
bool <- c(FALSE, TRUE, TRUE, FALSE, TRUE, TRUE)
# should become
c(FALSE, TRUE, TRUE, FALSE, FALSE, FALSE)
The answer from here handles all my cases correct, except the first one of the above examples.
How can I change
with(rle(bool), rep(lengths == max(lengths[values]) & values, lengths))
so that it also handles the first above example correct?
One option could be:
with(rle(bool), rep(seq_along(values) == which.max(lengths * values), lengths))
Results for the first vector:
[1] FALSE TRUE FALSE FALSE
For the second:
[1] FALSE FALSE FALSE TRUE TRUE
For the third:
[1] FALSE TRUE TRUE FALSE FALSE FALSE
Not elegant but might work:
bool <- c(FALSE, TRUE, FALSE, TRUE)
tt <- rle(bool)
t1 <- which.max(tt$lengths[tt$values])
tt$values[tt$values][-t1] <- FALSE
inverse.rle(tt)
#[1] FALSE TRUE FALSE FALSE
and as a function:
fun <- function(bool) {
tt <- rle(bool)
t1 <- which.max(tt$lengths[tt$values])
tt$values[tt$values][-t1] <- FALSE
inverse.rle(tt)
}
fun(c(FALSE, TRUE, FALSE, TRUE))
#[1] FALSE TRUE FALSE FALSE
fun(c(FALSE, TRUE, FALSE, TRUE, TRUE))
#[1] FALSE FALSE FALSE TRUE TRUE
fun(c(FALSE, TRUE, TRUE, FALSE, TRUE, TRUE))
#[1] FALSE TRUE TRUE FALSE FALSE FALSE
fun(FALSE)
#[1] FALSE
fun(logical(0))
#logical(0)

Extract all TRUE elements from a list in R

I have a list:
lst <- list(list(c(TRUE, TRUE, TRUE, TRUE),
c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE),
c(TRUE, TRUE)),
list(c(FALSE, FALSE),
c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE),
c(TRUE, TRUE,TRUE)))
I want to extract only TRUE element with their index.
The result have to be:
[[1]][[1]]
[1] TRUE TRUE TRUE TRUE
[[1]][[3]]
[1] TRUE TRUE
[[2]][[3]]
[1] TRUE TRUE TRUE
We loop through the nested list rename it with sequence and then extract if all are TRUE
lapply(lst, function(x) {x1 <- setNames(x, seq_along(x)); x1[sapply(x1, all)] })
#[[1]]
#[[1]]$`1`
#[1] TRUE TRUE TRUE TRUE
#[[1]]$`3`
#[1] TRUE TRUE
#[[2]]
#[[2]]$`3`
#[1] TRUE TRUE TRUE
Or another option is modify_depth from purrr, which result in empty list elements if the condition is not satisfied
library(purrr)
lst %>%
modify_depth(2, ~ .x[all(.x)])

compare vector to dataframe by applying identical with booleans

Apply does not work, but using identical directly does:
Create the dataframe
gp130 <- data.frame(matrix(nrow=7,ncol=6))
rownames(gp130) <- c("ABCDEF","ABCDE","ABCD","ABC","AB","BCDEF","MUCV5")
names(gp130) <- c("A","B","C","D","E","F")
gp130$A <- c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE)
gp130$B <- c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE)
gp130$C <- c(TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE)
gp130$D <- c(TRUE, TRUE, TRUE, FALSE, FALSE, TRUE, FALSE)
gp130$E <- c(TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE)
gp130$F <- c(TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE)
Evaluate dataframe
gp130
A B C D E F
ABCDEF TRUE TRUE TRUE TRUE TRUE TRUE
ABCDE TRUE TRUE TRUE TRUE TRUE FALSE
ABCD TRUE TRUE TRUE TRUE FALSE FALSE
ABC TRUE TRUE TRUE FALSE FALSE FALSE
AB TRUE TRUE FALSE FALSE FALSE FALSE
BCDEF FALSE TRUE TRUE TRUE TRUE TRUE
MUCV5 FALSE FALSE FALSE FALSE FALSE FALSE
Create a vector that matches column C
myv <- c(TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,FALSE) ##matches column C
apply(gp130, 2, identical, myv)
A B C D E F
FALSE FALSE FALSE FALSE FALSE FALSE
Why is C FALSE?
identical(gp130$C, myv)
[1] TRUE
Ok, I think I've got it. sapplystrips the column names, while apply doesn't, the vectors become named vectors. See the output of the two versions below.
apply(gp130, 2, function(x){
identical(x, myv)
print(x) # prints names
print(myv)
})
sapply(gp130, function(x){
identical(x, myv)
print(x)
print(myv)
})

Multiple different random samples in R

I'm new to R and I've got a question:
choice <- c(TRUE, FALSE, FALSE, FALSE)
rep(sample(choice, size = 4, replace=FALSE), times = n)
always repeats the same vector, e.g. (FALSE, TRUE, FALSE, FALSE)
However, I want to have n different random samples of the vector choice in a new vector (replace must be FALSE, because only 1 in 4 elements should be TRUE).
Which function should I choose? I'm not allowed to use for-loops.
You can use replicate. It returns a matrix, which you can then turn into a vector.
choice <- c(TRUE, FALSE, FALSE, FALSE)
n <- 3
set.seed(42) # for reproducibility
as.vector(replicate(n, sample(choice, size = 4, replace=FALSE)))
#[1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE

Replace/Modify values in logical vector (Pattern Matching)

The question looks simple but I didn't figure out how it can done in R.
I want to modify a logical vector depending on patterns of its values. There are two modification steps:
If there is a single FALSE surrounded by TRUE values, switch it to TRUE.
If there are less then 3 successive TRUE values, switch them to FALSE.
Everything else should remain as it is. Here's an example:
# input
x = c(FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE,
FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE)
# output
xo = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE)
cbind(x,xo) is
x xo
[1,] FALSE FALSE
[2,] TRUE FALSE
[3,] FALSE FALSE
[4,] FALSE FALSE
[5,] TRUE FALSE
[6,] TRUE FALSE
[7,] FALSE FALSE
[8,] FALSE FALSE
[9,] TRUE TRUE
[10,] TRUE TRUE
[11,] TRUE TRUE
[12,] FALSE TRUE
[13,] TRUE TRUE
[14,] TRUE TRUE
[15,] FALSE FALSE
[16,] FALSE FALSE
[17,] TRUE TRUE
[18,] TRUE TRUE
[19,] TRUE TRUE
[20,] TRUE TRUE
[21,] FALSE FALSE
I dont want to use a for loop because its slow and I would have to do a lot of if statements.
Is there a better way to get this working?
Here is an approach:
#sample data
x <- c(FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE,
FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE)
First, find the indices where FALSE values need to be changed to TRUE values, by looking for FALSE values that follow and are followed by TRUE values
tochange <-
intersect(
intersect(
which(x == FALSE), # not strictly necessary
which(diff(x) == 1) # FALSEs followed by a TRUE
),
which(diff(x) == -1) + 1 # FALSEs that follow a TRUE
)
Change the values
x[tochange] <- TRUE
Next, look for runs of TRUE (and FALSE) that are less than 3 in length, and set them to FALSE.
xrle <- rle(x)
xrle$values[xrle$lengths < 3] <- FALSE
newx <- inverse.rle(xrle) # thanks to Frank for pointing out inverse.rle!
# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
#[10] TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE
#[19] TRUE TRUE FALSE
You can try rle (thanks to #Frank for the modification)
xtmp <- inverse.rle(within.list(rle(x),{
n <- length(values)
values[lengths == 1 & !values & ! seq_len(n) %in% c(1,n)] <- TRUE
}))
res <- inverse.rle(within.list(rle(xtmp),
values[lengths < 3 & values] <- FALSE
))
identical(xo,res) # TRUE
Try:
make_true <- function(x) {
string <- paste(as.numeric(x), collapse='')
ans <- gregexpr('(?=(101))', string, perl=T)
x[as.numeric(ans[[1]])+1L] <- TRUE
res <- rle(x)
res$values[res$lengths < 3] <- FALSE
inverse.rle(res)
}
The function takes advantage of the fact that T and F can be coerced to numeric. The pattern searched for is "101".

Resources