Delete list conditional on the number of elements in R - r

I have a list L of unnamed comma separated character lists. Each list of characters is of unequal length. I need to drop the character lists that have less than 4 elements from L. How can this be done? Example L:
> L
[[1]]
[1] "A" "B" "C" "D"
[[2]]
[1] "E" "F" "G"
In the example above I would like to end up with:
> L
[[1]]
[1] "A" "B" "C" "D"

We can use lengths to get the length of the list elements as a vector, create a logical vector based on that and subset the list
L[lengths(L)>3]
#[[1]]
#[1] "A" "B" "C" "D"
A less optimized approach (used earlier) is to loop through the list elements with sapply, get the length and use that to subset
L[sapply(L, length)>3]
data
L <- list(LETTERS[1:4], LETTERS[5:7])

Related

Trim text after character for every item in list - R

I am trying to remove the text before and including a character ("-") for every element in a list.
Ex-
x = list(c("a-b","b-c","c-d"),c("a-b","e-f"))
desired output:
"b" "c" "d"
"b" "f"
I have tried using various combinations of lapply and gsub, such as
lapply(x,gsub,'.*-','',x)
but this just returns a null list-
[[1]]
[1] ""
[[2]]
[1] ""
And only using
gsub(".*-","",x)
returns
"d\")" "f\")"
You are close, but using lapply with gsub, R doesn't know which arguments are which. You just need to label the arguments explicitly.
x <- list(c("a-b","b-c","c-d"),c("a-b","e-f"))
lapply(x, gsub, pattern = "^.*-", replacement = "")
[[1]]
[1] "b" "c" "d"
[[2]]
[1] "b" "f"
This can be done with a for loop.
val<-list()
for(i in 1:length(x)){
val[[i]]<-gsub('.*-',"",x[[i]])}
val
[[1]]
[1] "b" "c" "d"
[[2]]
[1] "b" "f"

How do I apply an index vector over a list of vectors?

I want to apply a long index vector (50+ non-sequential integers) to a long list of vectors (50+ character vectors containing 100+ names) in order to retrieve specific values (as a list, vector, or data frame).
A simplified example is below:
> my.list <- list(c("a","b","c"),c("d","e","f"))
> my.index <- 2:3
Desired Output
[[1]]
[1] "b"
[[2]]
[1] "f"
##or
[1] "b"
[1] "f"
##or
[1] "b" "f"
I know I can get the same value from each element using:
> lapply(my.list, function(x) x[2])
##or
> lapply(my.list,'[', 2)
I can pull the second and third values from each element by:
> lapply(my.list,'[', my.index)
[[1]]
[1] "b" "c"
[[2]]
[1] "e" "f"
##or
> for(j in my.index) for(i in seq_along(my.list)) print(my.list[[i]][[j]])
[1] "b"
[1] "e"
[1] "c"
[1] "f"
I don't know how to pull just the one value from each element.
I've been looking for a few days and haven't found any examples of this being done, but it seems fairly straight forward. Am I missing something obvious here?
Thank you,
Scott
Whenever you have a problem that is like lapply but involves multiple parallel lists/vectors, consider Map or mapply (Map simply being a wrapper around mapply with SIMPLIFY=FALSE hardcoded).
Try this:
Map("[",my.list,my.index)
#[[1]]
#[1] "b"
#
#[[2]]
#[1] "f"
..or:
mapply("[",my.list,my.index)
#[1] "b" "f"

Selecting and matching multiple vectors in a list in R

I have a list of vectors like this:
>list
[[1]]
[1] "a" "m" "l" "s" "t" "o"
[[2]]
[1] "m" "y" "o" "t" "e"
[[3]]
[1] "n" "a" "s"
[[4]]
[1] "b" "u" "z" "u" "l" "a"
[[5]]
[1] "c" "m" "u" "s" "r" "i" "x" "t"
1-First, I want to select the vector in the table with the highest number of elements (in this case the 5th vector with 8 elements). This is easy.
2-Second I want to select all vectors in the list with length equal or immediately lower than the previous, and intersect them with the previous vector.
Another possibility I have is selecting by the name of the 1st character. In this case this would be equivalent to select the vectors starting with "a" or "b", the first and fourth in the list. In this case what I do not know is how to select multiple vectors in a list knowing their first element.
3-Finally, I want to keep just the intersection with the minimum number of matches.
In this case the the four vector in the list, starting with "b". Then start the process again for the rest of the vectors but considering already the 4th and 5th vector when "intersecting". In this case would be pick up the second element and intersect this element with a "unique() combination" of the 4th and 5th.
I hope I have explained myself!. Is there a way to do this in R without 3-4 "for" and "if" loops? in another words. Is there a clever way to do it using lapply or similar?
This should do it?
list <- strsplit(list("amlsto", "myote","nas","buzula","cmsusrixt"), "")
# find minimum length
lens <- sapply(list, length)
which.min(lens)
# which are same or 1 shorter than previous
inds <- which (lens==c(-1,head(lens, -1)) | lens==c(-1,head(lens,-1))-1)
# get the intersections
inters <- mapply(intersect, list[inds], list[inds-1], SIMPLIFY=FALSE)
#Get items where first in vector is in target set
target <- c("a","b")
isTarget <- sapply(list, "[[",1) %in% target
# Minimum number of overlaps
which.min(lapply(inters, length))

Subsetting on all but empty grep returns empty vector

Suppose I have some character vector, which I'd like to subset to elements that don't match some regular expression. I might use the - operator to remove the subset that grep matches:
> vec <- letters[1:5]
> vec
[1] "a" "b" "c" "d" "e"
> vec[-grep("d", vec)]
[1] "a" "b" "c" "e"
I'm given back everything except the entries that matched "d". But if I search for a regular expression that isn't found, instead of getting everything back as I would expect, I get nothing back:
> vec[-grep("z", vec)]
character(0)
Why does this happen?
It's because grep returns an integer vector, and when there's no match, it returns integer(0).
> grep("d", vec)
[1] 4
> grep("z", vec)
integer(0)
and the since the - operator works elementwise, and integer(0) has no elements, the negation doesn't change the integer vector:
> -integer(0)
integer(0)
so vec[-grep("z", vec)] evaluates to vec[-integer(0)] which in turn evaluates to vec[integer(0)], which is character(0).
You will get the behavior you expect with invert = TRUE:
> vec[grep("d", vec, invert = TRUE)]
[1] "a" "b" "c" "e"
> vec[grep("z", vec, invert = TRUE)]
[1] "a" "b" "c" "d" "e"

R - generate all combinations from 2 vectors given constraints

I would like to generate all combinations of two vectors, given two constraints: there can never be more than 3 characters from the first vector, and there must always be at least one characters from the second vector. I would also like to vary the final number of characters in the combination.
For instance, here are two vectors:
vec1=c("A","B","C","D")
vec2=c("W","X","Y","Z")
Say I wanted 3 characters in the combination. Possible acceptable permutations would be: "A" "B" "X"or "A" "Y" "Z". An unacceptable permutation would be: "A" "B" "C" since there is not at least one character from vec2.
Now say I wanted 5 characters in the combination. Possible acceptable permutations would be: "A" "C" "Z" "Y" or "A" "Y" "Z" "X". An unacceptable permutation would be: "A" "C" "D" "B" "X" since there are >3 characters from vec2.
I suppose I could use expand.grid to generate all combinations and then somehow subset, but there must be an easier way. Thanks in advance!
I'm not sure wheter this is easier, but you can leave away permutations that do not satisfy your conditions whith this strategy:
generate all combinations from vec1 that are acceptable.
generate all combinations from vec2 that are acceptable.
generate all combinations taking one solution from 1. + one solution from 2. Here I'd do the filtering with condition 3 afterwards.
(if you're looking for combinations, you're done, otherwise:) produce all permutations of letters within each result.
Now, let's have
vec1 <- LETTERS [1:4]
vec2 <- LETTERS [23:26]
## lists can eat up lots of memory, so use character vectors instead.
combine <- function (x, y)
combn (y, x, paste, collapse = "")
res1 <- unlist (lapply (0:3, combine, vec1))
res2 <- unlist (lapply (1:length (vec2), combine, vec2))
now we have:
> res1
[1] "" "A" "B" "C" "D" "AB" "AC" "AD" "BC" "BD" "CD" "ABC"
[13] "ABD" "ACD" "BCD"
> res2
[1] "W" "X" "Y" "Z" "WX" "WY" "WZ" "XY" "XZ" "YZ"
[11] "WXY" "WXZ" "WYZ" "XYZ" "WXYZ"
res3 <- outer (res1, res2, paste0)
res3 <- res3 [nchar (res3) == 5]
So here you are:
> res3
[1] "ABCWX" "ABDWX" "ACDWX" "BCDWX" "ABCWY" "ABDWY" "ACDWY" "BCDWY" "ABCWZ"
[10] "ABDWZ" "ACDWZ" "BCDWZ" "ABCXY" "ABDXY" "ACDXY" "BCDXY" "ABCXZ" "ABDXZ"
[19] "ACDXZ" "BCDXZ" "ABCYZ" "ABDYZ" "ACDYZ" "BCDYZ" "ABWXY" "ACWXY" "ADWXY"
[28] "BCWXY" "BDWXY" "CDWXY" "ABWXZ" "ACWXZ" "ADWXZ" "BCWXZ" "BDWXZ" "CDWXZ"
[37] "ABWYZ" "ACWYZ" "ADWYZ" "BCWYZ" "BDWYZ" "CDWYZ" "ABXYZ" "ACXYZ" "ADXYZ"
[46] "BCXYZ" "BDXYZ" "CDXYZ" "AWXYZ" "BWXYZ" "CWXYZ" "DWXYZ"
If you prefer the results split into single letters:
res <- matrix (unlist (strsplit (res3, "")), nrow = length (res3), byrow = TRUE)
> res
[,1] [,2] [,3] [,4] [,5]
[1,] "A" "B" "C" "W" "X"
[2,] "A" "B" "D" "W" "X"
[3,] "A" "C" "D" "W" "X"
[4,] "B" "C" "D" "W" "X"
(snip)
[51,] "C" "W" "X" "Y" "Z"
[52,] "D" "W" "X" "Y" "Z"
Which are your combinations.

Resources