R check if list of lists contains specific list - r

There are three lists:
a = list(1,2)
b = list(2,3)
c = list(a,b)
The command a %in% c yields FALSE FALSE. The result I would like to see is TRUE since a is an element of list c. How do I achieve this?

Check whether each component is identical to a and return TRUE if any of those comparisons are TRUE.
any(sapply(c, identical, a))
## [1] TRUE

This should also help:
list(a) %in% c
Examples:
a = list(1,2)
b = list(2,3)
c = list(a,b)
y = list(3,4)
z = list(1)
list(a) %in% c # True
list(b) %in% c # True
list(y) %in% c # False
list(z) %in% c # False

Related

Is there a better way to check if all elements in a list are named?

I want to check if all elements in a list are named. I've came up with this solution, but I wanted to know if there is a more elegant way to check this.
x <- list(a = 1, b = 2)
y <- list(1, b = 2)
z <- list (1, 2)
any(stringr::str_length(methods::allNames(x)) == 0L) # FALSE, all elements are
# named.
any(stringr::str_length(methods::allNames(y)) == 0L) # TRUE, at least one
# element is not named.
# Throw an error here.
any(stringr::str_length(methods::allNames(z)) == 0L) # TRUE, at least one
# element is not named.
# Throw an error here.
I am not sure if the following base R code works for your general cases, but it seems work for the ones in your post.
Define a function f to check the names
f <- function(lst) length(lst) == sum(names(lst) != "",na.rm = TRUE)
and you will see
> f(x)
[1] TRUE
> f(y)
[1] FALSE
> f(z)
[1] FALSE
We can create a function to check if the the names attribute is NULL or (|) there is blank ("") name, negate (!)
f1 <- function(lst1) is.list(lst1) && !(is.null(names(lst1))| '' %in% names(lst1))
-checking
f1(x)
#[1] TRUE
f1(y)
#[1] FALSE
f1(z)
#[1] FALSE
Or with allNames
f2 <- function(lst1) is.list(lst1) && !("" %in% allNames(lst1))
-checking
f2(x)
#[1] TRUE
f2(y)
#[1] FALSE
f2(z)
#[1] FALSE

discard elements from list recursively r

I have a nested list with some NAs, and I want to discard the NAs from the list.
purrr::discard does not work recursively:
l <- list(a = NA, b = T, c = c(F, F))
purrr::discard(l, is.na)
Throws this error:
Error: Predicate functions must return a single TRUE or FALSE, not a logical vector of length 2
I would like to end up with the following list in this case:
l2 <- list(b = T, c = c(F, F))
(purrr version: 0.3.2)
is.na(c(T,T,T)) returns c(F,F,F). To use discard, the function needs to return a single value for each list element as the error suggests.
This should work.
purrr::discard(l,function(x) all(is.na(x)))
This will work only if all the elements in an index of the list are NA.
To remove all NA elements this should work
library(tidyverse)
l <- list(a = NA, b = c(T,NA), c = c(F, F)) # Define a list
lapply(l,function(x) x[!is.na(x)])%>% # Remove all nested NA's
purrr::discard(.,function(x) length(x) == 0) # Remove all empty elements
EDIT(another option)
purrr::discard(l,function(x) isTRUE(anyNA(x)))
$b
[1] TRUE
$c
[1] FALSE FALSE
You can identify all NA elements and zap them:
purrr::list_modify(l,a=purrr::zap())
$b
[1] TRUE
$c
[1] FALSE FALSE
EDIT 2
If you want to remove all nested NAs, you can write up a helper zap_if():
zap_if <- function(x){
unlist(lapply(x, function(z) z[!is.na(z)]))
}
purrr::map(l,zap_if)
Result:
$a
[1] 1
$b
[1] TRUE
$c
[1] FALSE FALSE
Data for the zap_if part:
l <- list(a = c(NA,1), b = T, c = c(F, F))

Replace elements of vector by vector

I want to replace few elements of vector by whole second vector. Condition is, that replaced elements of first vector are equal to third vector. Here is an example:
a <- 1:10
b <- 5:7
v <- rnorm(2, mean = 1, sd = 5)
my output should be
c(a[1:4], v, a[8:10])
I have already tried
replace(a, a == b, v)
a[a == b] <- v
but with a little success. Can anyone help?
The == operator is best used to match vectors of the same length, or when one of the vector is only length 1.
Try this, and notice in neither case do you get the positional match that you desire.
> a == b
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Warning message:
In a == b : longer object length is not a multiple of shorter object length
> b == a
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Warning message:
In b == a : longer object length is not a multiple of shorter object length
Instead, use match() - this gives you the index position where there is a match in the values.
> match(b, a)
[1] 5 6 7
Then:
a <- 1:10
b <- 5:7
v <- rnorm(3, mean=1, sd=5)
a[match(b, a)] <- v
The results:
a
[1] 1.0000000 2.0000000 3.0000000 4.0000000 -4.6843669 0.9014578 -0.7601413 8.0000000
[9] 9.0000000 10.0000000
Here' another option:
a[a %in% b] <- v
Since in the example described in the OP there are three common numbers in the vectors a and b while v <- rnorm(2, mean = 1, sd = 5)
contains only 2 numbers, the vector v will be recycled and a warning will be issued.
The warning and recycling can be prevented, e.g., by defining v as
v <- rnorm(sum(a %in% b), mean = 1, sd = 5)

Difference of two character vectors with substring

I have two lists:
a <- c("da", "ba", "cs", "dd", "ek")
b <- c("zyc", "ulk", "mae", "csh", "ddi", "dada")
I want to remove the elements from list b which would have a substring match with any of the values in a, e.g.
grepl("da","dada") # TRUE
How would you go about doing this efficiently?
We can paste the 'a' elements to a single string with | as the delimiter, use that as pattern in grepl, negate (!) to subset 'b'.
b[!grepl(paste(a, collapse="|"), b)]
And another solution using a simple for loop:
sel <- rep(FALSE, length(b))
for (i in seq_along(a)) {
sel <- sel | grepl(a[i], b, fixed = TRUE)
}
b[!sel]
Not as elegant as some as the other solutions (especially the one by akrun), but showing that a for loop isn't always as slow in R as people believe:
fun1 <- function(a, b) {
sel <- rep(FALSE, length(b))
for (i in seq_along(a)) {
sel <- sel | grepl(a[i], b, fixed = TRUE)
}
b[!sel]
}
fun2 <- function(a, b) {
b[!apply(sapply(a, function(x) grepl(x,b, fixed=TRUE)),1,sum)]
}
fun3 <- function(a, b) {
b[-which(sapply(a, grepl, b, fixed=TRUE), arr.ind = TRUE)[, "row"]]
}
fun4 <- function(a, b) {
b[!grepl(paste(a, collapse="|"), b)]
}
library(stringr)
fun5 <- function(a, b) {
b[!sapply(b, function(u) any(str_detect(u,a)))]
}
a <- c("da", "ba", "cs", "dd", "ek")
b <- c("zyc", "ulk", "mae", "csh", "ddi", "dada")
b <- rep(b, length.out = 1E3)
library(microbenchmark)
microbenchmark(fun1(a, b), fun2(a, b), fun3(a,b), fun4(a,b), fun5(a,b))
# Unit: microseconds
# expr min lq mean median uq max neval cld
# fun1(a, b) 389.630 399.128 408.6146 406.007 411.7690 540.969 100 a
# fun2(a, b) 5274.143 5445.038 6183.3945 5544.522 5762.1750 35830.143 100 c
# fun3(a, b) 2568.734 2629.494 2691.8360 2686.552 2729.0840 2956.618 100 b
# fun4(a, b) 482.585 511.917 530.0885 528.993 541.6685 779.679 100 a
# fun5(a, b) 53846.970 54293.798 56337.6531 54861.585 55184.3100 132921.883 100 d
You could try the following:
b[!(+(apply(sapply(a, function(x) grepl(x,b)),1,sum)) > 0)]
[1] "zyc" "ulk" "mae"
'Peeling' this previous call from the inside, the results are the following: First, obtain a matrix of matches from the grepl: call (with sapply):
sapply(a, function(x) grepl(x,b))
# da ba cs dd ek
#[1,] FALSE FALSE FALSE FALSE FALSE
#[2,] FALSE FALSE FALSE FALSE FALSE
#[3,] FALSE FALSE FALSE FALSE FALSE
#[4,] FALSE FALSE TRUE FALSE FALSE
#[5,] FALSE FALSE FALSE TRUE FALSE
#[6,] TRUE FALSE FALSE FALSE FALSE
Note that the columns are the elements of a and the rows are the elements of b.
Then, apply the function sum per rows (in R, TRUE is 1 and FALSE is 0:
apply(sapply(a, function(x) grepl(x,b)),1,sum)
#[1] 0 0 0 1 1 1
Note that here, the row sums might be > 1 (if there is more than 1 match), so it must be coerced into a logical with the previous call wrapped around:
+() > 0
With this, we can match ([) the indices of b, but since we want the opposite, we use the operator !.
#full code:
step.one <- sapply(a, function(x) grepl(x,b))
step.two <- apply(step.one,1,sum)
step.three <- +(step.two > 0)
step.four <- !step.three
#finally:
b[step.four]
As David shows in the comments, this is a much more elegant approach:
b[-which(sapply(a, grepl, b), arr.ind = TRUE)[, "row"]]

R - Vectorized implementation of ternary operator?

The title says it about as well as I can. What I have:
A B
TRUE FALSE
FALSE TRUE
TRUE TRUE
what I want:
C
if(A[1]&&B[1]){some.value.here}else if(A[1]){other.value}else{another.value}
if(A[2]&&B[2]){some.value.here}else if(A[2]){other.value}else{another.value}
if(A[3]&&B[3]){some.value.here}else if(A[3]){other.value}else{another.value}
I've tried ifelse but only got atomic results not vectors.
Using ifelse works fine if with a little nesting. (It would have been nice to see your attempt to figure out where you went wrong.)
A = c(TRUE, FALSE, TRUE)
B = c(FALSE, TRUE, TRUE)
C = ifelse(A & B, "both", ifelse(A, "A only", "not A"))
cbind(A, B, C)
# A B C
# [1,] "TRUE" "FALSE" "A only"
# [2,] "FALSE" "TRUE" "not A"
# [3,] "TRUE" "TRUE" "both"
If you have a data frame with two columns, try using conditionals.
As a placeholder for your real replacement values, I chose "justA", "justB", and "both".
df$result[df$A & df$B] <- "both"
df$result[df$A & !df$B] <- "justA"
df$result[df$B & !df$A] <- "justB"
df
A B result
1 TRUE FALSE justA
2 FALSE TRUE justB
3 TRUE TRUE both
4 FALSE TRUE justB
Data
df <- data.frame(A=sample(c(T,F), 4, T), B=sample(c(T,F), 4, T))
df$result <- NA
If A and B are vectors:
> A = c(TRUE, FALSE, TRUE)
> B = c(FALSE, TRUE, TRUE)
You can use mapply():
> mapply(function (x, y) ifelse(x && y, 1, 2), A, B)
[1] 2 2 1

Resources