Remove duplicates in a nested list - r

I have a large list of lists where I want to remove duplicated elements in each list. Example:
x <- list(c("A", "A", "B", "C"), c("O", "C", "A", "Z", "O"))
x
[[1]]
[1] "A" "A" "B" "C"
[[2]]
[1] "O" "C" "A" "Z" "O"
I want the result to be a list that looks like this, where duplicates within a list are removed, but the structure of the list remains.
[[1]]
[1] "A" "B" "C"
[[2]]
[1] "O" "C" "A" "Z"
My main strategy has been to use rapply (also tried lapply) to identify duplicates and remove them. I tried:
x[rapply(x, duplicated) == T]
but received the following error:
"Error: (list) object cannot be coerced to type 'logical'"
Does anyone know a way to solve this issue?
Thanks!

We can use lapply with unique
lapply(x, unique)
#[[1]]
#[1] "A" "B" "C"
#[[2]]
#[1] "O" "C" "A" "Z"
The issue with rapply, is that it recursively applies the duplicated and then returns a single vector instead of a list of logical vectors
rapply(x, duplicated)
#[1] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
Instead it can be
lapply(x, function(u) u[!duplicated(u)])
#[[1]]
#[1] "A" "B" "C"
#[[2]]
#[1] "O" "C" "A" "Z"

Related

Trim text after character for every item in list - R

I am trying to remove the text before and including a character ("-") for every element in a list.
Ex-
x = list(c("a-b","b-c","c-d"),c("a-b","e-f"))
desired output:
"b" "c" "d"
"b" "f"
I have tried using various combinations of lapply and gsub, such as
lapply(x,gsub,'.*-','',x)
but this just returns a null list-
[[1]]
[1] ""
[[2]]
[1] ""
And only using
gsub(".*-","",x)
returns
"d\")" "f\")"
You are close, but using lapply with gsub, R doesn't know which arguments are which. You just need to label the arguments explicitly.
x <- list(c("a-b","b-c","c-d"),c("a-b","e-f"))
lapply(x, gsub, pattern = "^.*-", replacement = "")
[[1]]
[1] "b" "c" "d"
[[2]]
[1] "b" "f"
This can be done with a for loop.
val<-list()
for(i in 1:length(x)){
val[[i]]<-gsub('.*-',"",x[[i]])}
val
[[1]]
[1] "b" "c" "d"
[[2]]
[1] "b" "f"

R: how to apply to a list a function that joins all subelements except the first one

I am struggling with manipulating lists; now I want to join all subelements in an element EXCEPT THE FIRST ONE, in one operation if possible.
For example, I have a list that looks like this:
[[1]] [1] "A" "B" "C" "D" "E" "F"
[[2]] [1] "A" "B" "C"
[[3]] [1] "A" "B" "C" "D"
[[4]] [1] "A" "B" "C" "D"
[[5]] [1] "A" "B" "C" "D" "E"
And I want to obtain this:
[[1]] [1] "B;C;D;E;F"
[[2]] [1] "B;C"
[[3]] [1] "B;C;D"
[[4]] [1] "B;C;D"
[[5]] [1] "B;C;D;E"
So I need a function to apply in this way:
list2 <- lapply(list1,
function(x) {
#something here
})
It would be awesome if the function could be easily modified to leave out a different subelement (not just the first one, but the 3rd, or the last, or 2nd to last...).
Many thanks!
Lets make a reproducible example:
> L = list(LETTERS[1:6], LETTERS[1:3],LETTERS[1:4],LETTERS[1:4],LETTERS[1:5])
> L
[[1]]
[1] "A" "B" "C" "D" "E" "F"
[[2]]
[1] "A" "B" "C"
[[3]]
[1] "A" "B" "C" "D"
[[4]]
[1] "A" "B" "C" "D"
[[5]]
[1] "A" "B" "C" "D" "E"
Then you drop the first element and paste everything else together with a semicolon:
> lapply(L, function(x){paste(x[-1],collapse=";")})
[[1]]
[1] "B;C;D;E;F"
[[2]]
[1] "B;C"
[[3]]
[1] "B;C;D"
[[4]]
[1] "B;C;D"
[[5]]
[1] "B;C;D;E"
You get an empty string (no semicolons) if there's only one element in the list element to start with.
Read up about R's vector indexing to do selection of other elements of the x vector in the function.
[ is actually a function. You can try the below.
list1 <- list(
c("A", "B", "C"),
c("D", "E", "F", "G")
)
# for leaving out the first element
lapply(list1, `[`, -1)
# for leaving out the last element
lapply(list1, function(a) a[-length(a)])
# for leaving various elements
Map(`[`, list1, -c(1, 2))

Error in get() in R

I have several vectors:
aa<-c("a","b","b","b",NA)
bb<-c("g","g","g","i",NA)
cc<-c("y","y","x","y",NA)
all<-c("aa","bb","cc")
I wrote a loop so that all NA will be replaced by the most frequent levels:
for (i in 1:3)
{
get(all[i])[is.na(get(all[i]))]<-names(which.max(table(get(all[i]))))
}
But it doesn't work? Can someone explain why? I suspect it's something to do with the get() function?
Thank you
Try:
lst1 <- lapply(mget(all),function(x) {x[is.na(x)] <-names(which.max(table(x)))
x})
lst1
# $aa
#[1] "a" "b" "b" "b" "b"
# $bb
#[1] "g" "g" "g" "i" "g"
#$cc
#[1] "y" "y" "x" "y" "y"
In case, you wanted to replace the NA in original variable
list2env(lst1, envir=.GlobalEnv)
aa
#[1] "a" "b" "b" "b" "b"

How can I split a string and add them to vector?

I'd like to split a character vector so that additional members are added to the length of the vector.
> va <- c("a", "b", "c;d;e")
[1] "a" "b" "c;d;e"
> vb <- strsplit(va, ";")
[[1]]
[1] "a"
[[2]]
[1] "b"
[[3]]
[1] "c" "d" "e"
Can can I get vb vector in the same format as va vector so that I get 1-dimensional, 5 member vector in vb as such?
[1] "a" "b" "c" "d" "e"
Appreciate the help.
One possibility:
unlist(vb)
# [1] "a" "b" "c" "d" "e"
Or
scan(text=va, sep=";",what="")
#Read 5 items
# [1] "a" "b" "c" "d" "e"

R - generate all combinations from 2 vectors given constraints

I would like to generate all combinations of two vectors, given two constraints: there can never be more than 3 characters from the first vector, and there must always be at least one characters from the second vector. I would also like to vary the final number of characters in the combination.
For instance, here are two vectors:
vec1=c("A","B","C","D")
vec2=c("W","X","Y","Z")
Say I wanted 3 characters in the combination. Possible acceptable permutations would be: "A" "B" "X"or "A" "Y" "Z". An unacceptable permutation would be: "A" "B" "C" since there is not at least one character from vec2.
Now say I wanted 5 characters in the combination. Possible acceptable permutations would be: "A" "C" "Z" "Y" or "A" "Y" "Z" "X". An unacceptable permutation would be: "A" "C" "D" "B" "X" since there are >3 characters from vec2.
I suppose I could use expand.grid to generate all combinations and then somehow subset, but there must be an easier way. Thanks in advance!
I'm not sure wheter this is easier, but you can leave away permutations that do not satisfy your conditions whith this strategy:
generate all combinations from vec1 that are acceptable.
generate all combinations from vec2 that are acceptable.
generate all combinations taking one solution from 1. + one solution from 2. Here I'd do the filtering with condition 3 afterwards.
(if you're looking for combinations, you're done, otherwise:) produce all permutations of letters within each result.
Now, let's have
vec1 <- LETTERS [1:4]
vec2 <- LETTERS [23:26]
## lists can eat up lots of memory, so use character vectors instead.
combine <- function (x, y)
combn (y, x, paste, collapse = "")
res1 <- unlist (lapply (0:3, combine, vec1))
res2 <- unlist (lapply (1:length (vec2), combine, vec2))
now we have:
> res1
[1] "" "A" "B" "C" "D" "AB" "AC" "AD" "BC" "BD" "CD" "ABC"
[13] "ABD" "ACD" "BCD"
> res2
[1] "W" "X" "Y" "Z" "WX" "WY" "WZ" "XY" "XZ" "YZ"
[11] "WXY" "WXZ" "WYZ" "XYZ" "WXYZ"
res3 <- outer (res1, res2, paste0)
res3 <- res3 [nchar (res3) == 5]
So here you are:
> res3
[1] "ABCWX" "ABDWX" "ACDWX" "BCDWX" "ABCWY" "ABDWY" "ACDWY" "BCDWY" "ABCWZ"
[10] "ABDWZ" "ACDWZ" "BCDWZ" "ABCXY" "ABDXY" "ACDXY" "BCDXY" "ABCXZ" "ABDXZ"
[19] "ACDXZ" "BCDXZ" "ABCYZ" "ABDYZ" "ACDYZ" "BCDYZ" "ABWXY" "ACWXY" "ADWXY"
[28] "BCWXY" "BDWXY" "CDWXY" "ABWXZ" "ACWXZ" "ADWXZ" "BCWXZ" "BDWXZ" "CDWXZ"
[37] "ABWYZ" "ACWYZ" "ADWYZ" "BCWYZ" "BDWYZ" "CDWYZ" "ABXYZ" "ACXYZ" "ADXYZ"
[46] "BCXYZ" "BDXYZ" "CDXYZ" "AWXYZ" "BWXYZ" "CWXYZ" "DWXYZ"
If you prefer the results split into single letters:
res <- matrix (unlist (strsplit (res3, "")), nrow = length (res3), byrow = TRUE)
> res
[,1] [,2] [,3] [,4] [,5]
[1,] "A" "B" "C" "W" "X"
[2,] "A" "B" "D" "W" "X"
[3,] "A" "C" "D" "W" "X"
[4,] "B" "C" "D" "W" "X"
(snip)
[51,] "C" "W" "X" "Y" "Z"
[52,] "D" "W" "X" "Y" "Z"
Which are your combinations.

Resources