Check if there is overlap between elements of a list - r

I have a list of integers and I want to check if the elements are all unique ones.
set.seed(2)
x <- list(a=sample(10,3),b=sample(10,5),c=sample(10,7))
x
# $a
# [1] 2 7 5
# $b
# [1] 2 9 8 1 6
# $c
# [1] 5 10 9 2 8 1 7
For this example, all of the following situations fails the check: 1) 2 appears in all entries, 2) 5 appears in $a and $c, 3) 8 appears in $b and $c, 4) 1 appears in $b and $c, etc.
y <- list(a=c(1,3,5),b=c(7,4),c=c(6,10))
There is no overlapping between elements of y, so it passes the check.
The expected output should be just True/False indicating whether the list passes the check.

You can convert the list to a vector with unlist and then check if any elements are duplicated in the vector with any and duplicated.
!any(duplicated(unlist(x)))
# [1] FALSE
!any(duplicated(unlist(y)))
# [1] TRUE

Related

Replace empty element in a list using sapply

I have two lists. The first one has an empty element. I'd like to replace that empty element with the first vector of the third list element of another list.
l1 <- list(a=1:3,b=4:9,c="")
l2 <- list(aa=11:13,bb=14:19,cc=data.frame(matrix(100:103,ncol=2)))
l1[sapply(l1, `[[`, 1)==""] <- l2[[3]][[1]]
Using sapply, I can identify which elements are empty. However, when I try to assign a vector to this empty element: I get this error message:
Warning message: In l1[sapply(l1, [[, 1) == ""] <- l2[[3]][[1]] :
number of items to replace is not a multiple of replacement length
This is only a warning, but the result I get is not the one I want. This is the l1 I get:
> l1
$a
[1] 1 2 3
$b
[1] 4 5 6 7 8 9
$c
[1] 100
This is what I need (two elements in $c):
> l1
$a
[1] 1 2 3
$b
[1] 4 5 6 7 8 9
$c
[1] 100 101
Just use l2[[3]][1] on the right hand side (single [ not [[)
The right-hand side should be a list, since you're replacing a list element. So you want that to be
... <- list(l2[[3]][[1]])
In addition, you might consider using !nzchar(l1) in place of sapply(...) == "". It might be more efficient. The final expression would be:
l1[!nzchar(l1)] <- list(l2[[3]][[1]])
giving the updated l1:
$a
[1] 1 2 3
$b
[1] 4 5 6 7 8 9
$c
[1] 100 101

remove certain vectors from a list

I want to remove certain vectors from a list. I have for example this:
a<-c(1,2,5)
b<-c(1,1,1)
c<-c(1,2,3,4)
d<-c(1,2,3,4,5)
exampleList<-list(a,b,c,d)
exampleList returns of course:
[[1]]
[1] 1 2 5
[[2]]
[1] 1 1 1
[[3]]
[1] 1 2 3 4
[[4]]
[1] 1 2 3 4 5
Is there a way to remove certain vectors from a list in R. I want to remove all vectors in the list exampleList which contain both 1 and 5(so not only vectors which contain 1 or 5, but both). Thanks in advance!
Use Filter:
filteredList <- Filter(function(v) !(1 %in% v & 5 %in% v), exampleList)
print(filteredList)
#> [[1]]
#> [1] 1 1 1
#>
#> [[2]]
#> [1] 1 2 3 4
Filter uses a functional style. The first argument you pass is a function that returns TRUE for an element you want to keep in the list, and FALSE for an element you want to remove from the list. The second argument is just the list itself.
We can use sapply on every list element and remove those elements where both the values 1 and 5 are present.
exampleList[!sapply(exampleList, function(x) any(x == 1) & any(x == 5))]
#[[1]]
#[1] 1 1 1
#[[2]]
#[1] 1 2 3 4
Here a solution with two steps:
exampleList<-list(a=c(1,2,5), b=c(1,1,1), c=c(1,2,3,4), d=c(1,2,3,4,5))
L <- lapply(exampleList, function(x) if (!all(c(1,5) %in% x)) x)
L[!sapply(L, is.null)]
# $b
# [1] 1 1 1
#
# $c
# [1] 1 2 3 4
Here is a one-step variant without any definition of a new function
exampleList[!apply(sapply(exampleList, '%in%', x=c(1,5)), 2, all)]
(... but it has two calls to apply-functions)

Remove elements in a list in R

I want to remove part of the list where it is a complete set of the other part of the list. For example, B intersect A and E intersect C, therefore B and E should be removed.
MyList <- list(A=c(1,2,3,4,5), B=c(3,4,5), C=c(6,7,8,9), E=c(7,8))
MyList
$A
[1] 1 2 3 4 5
$B
[1] 3 4 5
$C
[1] 6 7 8 9
$E
[1] 7 8
MyListUnique <- RemoveSubElements(MyList)
MyListUnique
$A
[1] 1 2 3 4 5
$C
[1] 6 7 8 9
Any ideas ? Any know function to do it ?
As long as your data is not too huge, you can use an approach like the following:
# preparation
MyList <- MyList[order(lengths(MyList))]
idx <- vector("list", length(MyList))
# loop through list and compare with other (longer) list elements
for(i in seq_along(MyList)) {
idx[[i]] <- any(sapply(MyList[-seq_len(i)], function(x) all(MyList[[i]] %in% x)))
}
# subset the list
MyList[!unlist(idx)]
#$C
#[1] 6 7 8 9
#
#$A
#[1] 1 2 3 4 5
Similar to the other answer, but hopefully clearer, using a helper function and 2 sapplys.
#helper function to determine a proper subset - shortcuts to avoid setdiff calculation if they are equal
is.proper.subset <- function(x,y) !setequal(x,y) && length(setdiff(x,y))==0
#double loop over the list to find elements which are proper subsets of other elements
idx <- sapply(MyList, function(x) any(sapply(MyList, function(y) is.proper.subset(x,y))))
#filter out those that are proper subsets
MyList[!idx]
$A
[1] 1 2 3 4 5
$C
[1] 6 7 8 9

R: how to find index of all repetition vector values order by unique vector without using loop?

I have a vector of integers like this:
a <- c(2,3,4,1,2,1,3,5,6,3,2)
values<-c(1,2,3,4,5,6)
I want to list, for every unique value in my vector (the unique values being ordered), the position of their occurences. My desired output:
rep_indx<-data.frame(c(4,6),c(1,5,11),c(2,7,10),c(3),c(8),c(9))
split fits pretty well here, which returns a list of indexes for each unique value in a:
indList <- split(seq_along(a), a)
indList
# $`1`
# [1] 4 6
#
# $`2`
# [1] 1 5 11
#
# $`3`
# [1] 2 7 10
#
# $`4`
# [1] 3
#
# $`5`
# [1] 8
#
# $`6`
# [1] 9
And you can access the index by passing the value as a character, i.e.:
indList[["1"]]
# [1] 4 6
You can do this, using sapply. The ordering that you need is ensured by the sort function.
sapply(sort(unique(a)), function(x) which(a %in% x))
#### [[1]]
#### [1] 4 6
####
#### [[2]]
#### [1] 1 5 11
#### ...
It will result in a list, giving the indices of your repetitions. It can't be a data.frame because a data.frame needs to have columns of same lengths.
sort(unique(a)) is exactly your vector variable.
NOTE: you can also use lapply to force the output to be a list. With sapply, you get a list except if by chance the number of replicates is always the same, then the output will be a matrix... so, your choice!
Perhaps this also works
order(match(a, values))
#[1] 4 6 1 5 11 2 7 10 3 8 9
You can use the lapply function to return a list with the indexes.
lapply(values, function (x) which(a == x))

Why are these sequences reversed when generated with the colon operator?

I've noticed that when I try to generate a list of sequences with the : operator (without an anonymous function), the sequences are always reversed. Take the following example.
x <- c(4, 6, 3)
lapply(x, ":", from = 1)
# [[1]]
# [1] 4 3 2 1
#
# [[2]]
# [1] 6 5 4 3 2 1
#
# [[3]]
# [1] 3 2 1
But when I use seq, everything is fine.
lapply(x, seq, from = 1)
# [[1]]
# [1] 1 2 3 4
#
# [[2]]
# [1] 1 2 3 4 5 6
#
# [[3]]
# [1] 1 2 3
And from help(":") it is stated that
For other arguments from:to is equivalent to seq(from, to), and generates a sequence from from to to in steps of 1 or -1.
Why is the first list of sequences reversed?
Can I generated forward sequences this way with the colon operator with lapply?
Or do I always have to use lapply(x, function(y) 1:y)?
The ":" operator is implemented as the primitive do_colon function in C. This primitive function does not have named arguments. It simply takes the first parameter as the "from" and the second as the "to" ignorning any parameter names. See
`:`(to=10, from=5)
# [1] 10 9 8 7 6 5
Additionally the lapply function only passes it's values as a leading unnamed parameter in the function call. You cannot pass values to primitive functions via lapply as the second positional argument.

Resources