remove certain vectors from a list - r

I want to remove certain vectors from a list. I have for example this:
a<-c(1,2,5)
b<-c(1,1,1)
c<-c(1,2,3,4)
d<-c(1,2,3,4,5)
exampleList<-list(a,b,c,d)
exampleList returns of course:
[[1]]
[1] 1 2 5
[[2]]
[1] 1 1 1
[[3]]
[1] 1 2 3 4
[[4]]
[1] 1 2 3 4 5
Is there a way to remove certain vectors from a list in R. I want to remove all vectors in the list exampleList which contain both 1 and 5(so not only vectors which contain 1 or 5, but both). Thanks in advance!

Use Filter:
filteredList <- Filter(function(v) !(1 %in% v & 5 %in% v), exampleList)
print(filteredList)
#> [[1]]
#> [1] 1 1 1
#>
#> [[2]]
#> [1] 1 2 3 4
Filter uses a functional style. The first argument you pass is a function that returns TRUE for an element you want to keep in the list, and FALSE for an element you want to remove from the list. The second argument is just the list itself.

We can use sapply on every list element and remove those elements where both the values 1 and 5 are present.
exampleList[!sapply(exampleList, function(x) any(x == 1) & any(x == 5))]
#[[1]]
#[1] 1 1 1
#[[2]]
#[1] 1 2 3 4

Here a solution with two steps:
exampleList<-list(a=c(1,2,5), b=c(1,1,1), c=c(1,2,3,4), d=c(1,2,3,4,5))
L <- lapply(exampleList, function(x) if (!all(c(1,5) %in% x)) x)
L[!sapply(L, is.null)]
# $b
# [1] 1 1 1
#
# $c
# [1] 1 2 3 4
Here is a one-step variant without any definition of a new function
exampleList[!apply(sapply(exampleList, '%in%', x=c(1,5)), 2, all)]
(... but it has two calls to apply-functions)

Related

What is the best way to append values to a sublist of a list in R?

Say I have a list l containing sublists, and I would like to add an element (or several elements) at the end of each sublist of that list.
What is the best way to implement this (simplest/fastest code)?
# Goal: Append given elements to each sublist of a list in R
# Example: add c(9) to sublists of list l
l <- list(c(1,2,3), c(2,1,4), c(4,7,6))
l
[[1]]
[1] 1 2 3
[[2]]
[1] 2 1 4
[[3]]
[1] 4 7 6
# Desired Output:
# l_9
# [[1]]
# [1] 1 2 3 9
# [[2]]
# [1] 2 1 4 9
# [[3]]
# [1] 4 7 6 9
In these cases is useful to use lapply function. You can see here: apply functions
So you can just use l <- lapply(l, c, 9)
where c is the function of combining. Here: c function

Remove elements in a list in R

I want to remove part of the list where it is a complete set of the other part of the list. For example, B intersect A and E intersect C, therefore B and E should be removed.
MyList <- list(A=c(1,2,3,4,5), B=c(3,4,5), C=c(6,7,8,9), E=c(7,8))
MyList
$A
[1] 1 2 3 4 5
$B
[1] 3 4 5
$C
[1] 6 7 8 9
$E
[1] 7 8
MyListUnique <- RemoveSubElements(MyList)
MyListUnique
$A
[1] 1 2 3 4 5
$C
[1] 6 7 8 9
Any ideas ? Any know function to do it ?
As long as your data is not too huge, you can use an approach like the following:
# preparation
MyList <- MyList[order(lengths(MyList))]
idx <- vector("list", length(MyList))
# loop through list and compare with other (longer) list elements
for(i in seq_along(MyList)) {
idx[[i]] <- any(sapply(MyList[-seq_len(i)], function(x) all(MyList[[i]] %in% x)))
}
# subset the list
MyList[!unlist(idx)]
#$C
#[1] 6 7 8 9
#
#$A
#[1] 1 2 3 4 5
Similar to the other answer, but hopefully clearer, using a helper function and 2 sapplys.
#helper function to determine a proper subset - shortcuts to avoid setdiff calculation if they are equal
is.proper.subset <- function(x,y) !setequal(x,y) && length(setdiff(x,y))==0
#double loop over the list to find elements which are proper subsets of other elements
idx <- sapply(MyList, function(x) any(sapply(MyList, function(y) is.proper.subset(x,y))))
#filter out those that are proper subsets
MyList[!idx]
$A
[1] 1 2 3 4 5
$C
[1] 6 7 8 9

Check if there is overlap between elements of a list

I have a list of integers and I want to check if the elements are all unique ones.
set.seed(2)
x <- list(a=sample(10,3),b=sample(10,5),c=sample(10,7))
x
# $a
# [1] 2 7 5
# $b
# [1] 2 9 8 1 6
# $c
# [1] 5 10 9 2 8 1 7
For this example, all of the following situations fails the check: 1) 2 appears in all entries, 2) 5 appears in $a and $c, 3) 8 appears in $b and $c, 4) 1 appears in $b and $c, etc.
y <- list(a=c(1,3,5),b=c(7,4),c=c(6,10))
There is no overlapping between elements of y, so it passes the check.
The expected output should be just True/False indicating whether the list passes the check.
You can convert the list to a vector with unlist and then check if any elements are duplicated in the vector with any and duplicated.
!any(duplicated(unlist(x)))
# [1] FALSE
!any(duplicated(unlist(y)))
# [1] TRUE

Getting index of vector with delimited parts

I have vectors that looks like these variations:
cn1 <- c("Probe","Genes","foo","bar","Probe","Genes","foo","bar")
# 0 1 2 3 4 5 6 7
cn2 <- c("Probe","Genes","foo","bar","qux","Probe","Genes","foo","bar","qux")
# 0 1 2 3 4 5 6 7 8 9
Note that in each vector above consists of two parts. They are separated with "Probe","Genes".
What I want to do is to get the indexes of the first part of the entry in between that separator. Yielding
cn1_id ------> [2,3]
cn2_id ------> [2,3,4]
How can I achieve that in R?
I tried this but it doesn't do what I want:
> split(cn1,c("Probe","Genes"))
$Genes
[1] "Genes" "bar" "Genes" "bar"
$Probe
[1] "Probe" "foo" "Probe" "foo"
Here's a function that you can use. Note that R vectors are 1-based so counting starts at 1 rather than 0.
findidx <- function(x) {
idx <- which(x=="Probe" & c(tail(x,-1),NA)=="Genes")
if (length(idx)>1) {
(idx[1]+2):(idx[2]-1)
} else {
NA # what to return if no match found
}
}
findidx(cn1)
# [1] 3 4
findidx(cn2)
# [1] 3 4 5
You could try between from data.table
indx <- between(cn1, 'Genes', 'Probe')
indx2 <- between(cn2, 'Genes', 'Probe')
which(cumsum(indx)==2)[-1]-1
#[1] 2 3
which(cumsum(indx2)==2)[-1]-1
#[1] 2 3 4

Why are these sequences reversed when generated with the colon operator?

I've noticed that when I try to generate a list of sequences with the : operator (without an anonymous function), the sequences are always reversed. Take the following example.
x <- c(4, 6, 3)
lapply(x, ":", from = 1)
# [[1]]
# [1] 4 3 2 1
#
# [[2]]
# [1] 6 5 4 3 2 1
#
# [[3]]
# [1] 3 2 1
But when I use seq, everything is fine.
lapply(x, seq, from = 1)
# [[1]]
# [1] 1 2 3 4
#
# [[2]]
# [1] 1 2 3 4 5 6
#
# [[3]]
# [1] 1 2 3
And from help(":") it is stated that
For other arguments from:to is equivalent to seq(from, to), and generates a sequence from from to to in steps of 1 or -1.
Why is the first list of sequences reversed?
Can I generated forward sequences this way with the colon operator with lapply?
Or do I always have to use lapply(x, function(y) 1:y)?
The ":" operator is implemented as the primitive do_colon function in C. This primitive function does not have named arguments. It simply takes the first parameter as the "from" and the second as the "to" ignorning any parameter names. See
`:`(to=10, from=5)
# [1] 10 9 8 7 6 5
Additionally the lapply function only passes it's values as a leading unnamed parameter in the function call. You cannot pass values to primitive functions via lapply as the second positional argument.

Resources