R: Find unique vectors in list of vectors - r

I have a list of vectors
list_of_vectors <- list(c("a", "b", "c"), c("a", "c", "b"), c("b", "c", "a"), c("b", "b", "c"), c("c", "c", "b"), c("b", "c", "b"), c("b", "b", "c", "d"), NULL)
For this list I would like to know which vectors are unique in terms of their elements. That is, I would like the following output
[[1]]
[1] "a" "b" "c"
[[2]]
[1] "b" "b" "c"
[[3]]
[1] "c" "c" "b"
[[4]]
[1] "b" "b" "c" "d"
[[5]]
[1] NULL
Is there a function in R for performing this check? Or do I need do a lot of workarounds by writing functions?
My current not so elegant solution:
# Function for turning vectors into strings ordered by alphabet
stringer <- function(vector) {
if(is.null(vector)) {
return(NULL)
} else {
vector_ordered <- vector[order(vector)]
vector_string <- paste(vector_ordered, collapse = "")
return(vector_string)
}
}
# Identifying unique strings
vector_strings_unique <- unique(lapply(list_of_vectors, function(vector)
stringer(vector)))
vector_strings_unique
[[1]]
[1] "abc"
[[2]]
[1] "bbc"
[[3]]
[1] "bcc"
[[4]]
[1] "bbcd"
[[5]]
NULL
# Function for splitting the strings back into vectors
splitter <- function(string) {
if(is.null(string)) {
return(NULL)
} else {
vector <- unlist(strsplit(string, split = ""))
return(vector)
}
}
# Applying function
lapply(vector_strings_unique, function(string) splitter(string))
[[1]]
[1] "a" "b" "c"
[[2]]
[1] "b" "b" "c"
[[3]]
[1] "c" "c" "b"
[[4]]
[1] "b" "b" "c" "d"
[[5]]
[1] NULL
It does the trick and could be rewritten as a single function, but there must be a more elegant solution.

We can sort the list elements, apply duplicated to get a logical index of unique elements and subset the list based on that
list_of_vectors[!duplicated(lapply(list_of_vectors, sort))]
#[[1]]
#[1] "a" "b" "c"
#[[2]]
#[1] "b" "b" "c"
#[[3]]
#[1] "c" "c" "b"
#[[4]]
#[1] "b" "b" "c" "d"
#[[5]]
#NULL

Related

Generate all combinations (and their sum) of a vector of characters in R

Suppose that I have a vector of length n and I need to generate all possible combinations and their sums. For example:
If n=3, we have:
myVec <- c("a", "b", "c")
Output =
"a"
"b"
"c"
"a+b"
"a+c"
"b+c"
"a+b+c"
Note that we consider that a+b = b+a, so only need to keep one.
Another example if n=4,
myVec <- c("a", "b", "c", "d")
Output:
"a"
"b"
"c"
"d"
"a+b"
"a+c"
"a+d"
"b+c"
"b+d"
"c+d"
"a+b+c"
"a+c+d"
"b+c+d"
"a+b+c+d"
We can use sapply with varying length in combn and use paste as function to apply.
sapply(seq_along(myVec), function(n) combn(myVec, n, paste, collapse = "+"))
#[[1]]
#[1] "a" "b" "c"
#[[2]]
#[1] "a+b" "a+c" "b+c"
#[[3]]
#[1] "a+b+c"
myVec <- c("a", "b", "c", "d")
sapply(seq_along(myVec), function(n) combn(myVec, n, paste, collapse = "+"))
#[[1]]
#[1] "a" "b" "c" "d"
#[[2]]
#[1] "a+b" "a+c" "a+d" "b+c" "b+d" "c+d"
#[[3]]
#[1] "a+b+c" "a+b+d" "a+c+d" "b+c+d"
#[[4]]
#[1] "a+b+c+d"
We can unlist if we need output as single vector.

R remove an object from a list of vectors

I have a list of vectors and i would like to remove a specific object. Any ideas hot to achieve that?
Lets say i would like to remove the object F. How can i do that?
blocks <- list(
c("A", "B"),
c("C"),
c("D","E", "F")
)
We could also use setdiff with Map
Map(setdiff, blocks, 'F')
#[[1]]
#[1] "A" "B"
#[[2]]
#[1] "C"
#[[3]]
#[1] "D" "E"
or with lapply
lapply(blocks, setdiff, 'F')
#[[1]]
#[1] "A" "B"
#[[2]]
#[1] "C"
#[[3]]
#[1] "D" "E"
If you wanted to remove the third element of the third element of your list, you could try:
blocks[[3]] <- blocks[[3]][-3]
blocks
# [[1]]
# [1] "A" "B"
#
# [[2]]
# [1] "C"
#
# [[3]]
# [1] "D" "E"
If you wanted to remove all elements equal to "F", you could use lapply and a user-defined function to process each vector in the list, removing all "F" elements.
lapply(blocks, function(x) x[x != "F"])
# [[1]]
# [1] "A" "B"
#
# [[2]]
# [1] "C"
#
# [[3]]
# [1] "D" "E"

Extract data between rows r

I have the following row:
rep(c("foo",rep(c('A','B'),2),"bar",rep(c("C","D"),2)),2)
[1] "foo" "A" "B" "A" "B" "bar" "C" "D" "C" "D" "foo" "A"
[13] "B" "A" "B" "bar" "C" "D" "C" "D"
I would like to extract the data between 'foo' and and 'bar' to get
[1] "A" "B" "A" "B" "A" "B" "A" "B"
How would you do perform this task in r?
I used this approach which I think is the easiest to understand and a more R idiomatic way of doing it.
s <- rep(c("foo", rep(c('A','B'), 2), "bar", rep(c("C","D"), 2)), 2) # Your vector
get <- c(mapply(seq, which(s == "foo") + 1, which(s == "bar") - 1))
s[get]
#[1] "A" "B" "A" "B" "A" "B" "A" "B"
Using similar methodology to this answer
temp <- paste(rep(c("foo",rep(c('A','B'),2),"bar",rep(c("C","D"),2)),2), collapse = "")
unlist(strsplit(regmatches(temp, gregexpr('(?<=foo).*?(?=bar)', temp, perl=T))[[1]], ""))
##[1] "A" "B" "A" "B" "A" "B" "A" "B"
May be this helps:
If vec is the vector
vec[mapply(`:`, grep("foo", vec), grep("bar", vec))[-c(1,6),]]
#[1] "A" "B" "A" "B" "A" "B" "A" "B"
or
vec1 <- vec[mapply(`:`, grep("foo", vec), grep("bar", vec))]
vec1[!grepl(paste(c("foo","bar"),collapse="|"), vec1)]
#[1] "A" "B" "A" "B" "A" "B" "A" "B"
Update
For a vector like below:
vec1 <- c("foo", "A", "B", "bar", "C", "D", "bar", "foo", "A", "B", "A",
"bar", "C", "D", "D", "zoo", "A", "B", "foo", "A", "bar", "B", "A", "zoo",
"A", "foo", "A", "B")
you could use:
fun1 <- function(vec, first, second) {
lst <- split(vec, cumsum(vec == first))
unlist(lapply(lst, function(x) {
indx <- match(second, x) - 1
if (!is.na(indx) & indx>1) {
x[2:indx]
}
}), use.names = F)
}
fun1(vec1, "foo", "bar")
#[1] "A" "B" "A" "B" "A" "A"
fun1(vec, "foo", "bar")
#[1] "A" "B" "A" "B" "A" "B" "A" "B"
BTW, #David Arenburg's method works for both cases.
Assuming that foo is first among the two and that they alternate:
vec <- rep(c("foo",rep(c('A','B'),2),"bar",rep(c("C","D"),2)),2)
idx <- vec %in% c("foo", "bar")
vec[cumsum(idx) %% 2 == 1 & !idx]
# [1] "A" "B" "A" "B" "A" "B" "A" "B"

Replace rows in an array of lists with rows from matrix

I currently have an array of lists and a matrix that are produced with this code:
require(gtools)
FiveStates = array(list(NULL), c(32,2))
four.1 = combinations(5,4,c(LETTERS[1:5]))
four.2 = four.1[nrow(four.1):1,]
I'd like to replace rows 2-6 of the first column in the array FiveStates with all five rows in the matrix four.2 elementwise. How can I do this without having to replace each row separately?
Edit: I'd like to make FiveStates[2,1] show "B", "C", "D", "E"; FiveStates[3,1] show "A", "C", "D", "E"; and so on and so forth, so that the 2nd to 6th entries in the first column of FiveStates have vectors that match the rows of four.2[2:6,].
(Also, the package you need to use the combinations() function is now in the code. Sorry about that.)
You can replace blocks of a matrix that is populated with list using regular matrix indexing:
FiveStates[2:6, 1] <- lapply(1:nrow(four.1), function(x) four.2[x, ] )
FiveStates[2:6, 1]
[[1]]
[1] "B" "C" "D" "E"
[[2]]
[1] "A" "C" "D" "E"
[[3]]
[1] "A" "B" "D" "E"
[[4]]
[1] "A" "B" "C" "E"
[[5]]
[1] "A" "B" "C" "D"

generate labels for variables in R

I'm searching for a better/faster way than this one to generate labels for a variable :
df <- data.frame(a=c(0,7,1,10,2,4,3,5,10,1,7,8,3,2))
pick <- c(0,1,2,3,10)
df[sapply(df$a,function(x) !(x %in% pick)),"a"] <- "a"
df[sapply(df$a,function(x) x==0),"a"] <- "b"
df[sapply(df$a,function(x) x==1 | x==2 | x==3),"a"] <- "c"
df[sapply(df$a,function(x) x==10),"a"] <- "d"
df$a
[1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c"
For simplicity, I just have one variable in this example, of course there are more variables in my dataset but I just want to change a specific one.
You don't need sapply:
df$a[!df$a %in% pick] <- "a"
df$a[df$a==0] <- "b"
df$a[df$a %in% 1:3] <- "c"
df$a[df$a==10] <- "d"
You could also produce the same result with factors:
df <- data.frame(a=c(0,7,1,10,2,4,3,5,10,1,7,8,3,2))
# the above method
a <- df$a
a[!df$a %in% pick] <- "a"
a[df$a==0] <- "b"
a[df$a %in% 1:3] <- "c"
a[df$a==10] <- "d"
# one way that gives a warning
b1 <- factor(df$a, levels=0:10, labels=c("b",rep("c",3),rep("a",6),"d"))
# another way that won't give a warning
b2 <- factor(df$a)
levels(b2) <- c("b",rep("c",3),rep("a",4),"d")
b2 <- as.character(b2)
# a third strategy using `library(car)`
b3 <- car::recode(df$a,"0='b';1:3='c';10='d';else='a'")
# check that all strategies are the same
all.equal(a,as.character(b1))
# [1] TRUE
all.equal(as.character(b1),as.character(b2))
# [1] TRUE
all.equal(as.character(b1),as.character(b3))
# [1] TRUE
You might also consider mapvalues or revalue in plyr, particularly if you're dealing with more labels:
df$a <- mapvalues(df$a, c(0, 1, 2, 3, 10), c("b", "c", "c", "c", "d"))
df$a[! df$a %in% c("b", "c", "d")] <- "a" # The !pick values
Here is another fairly straightforward solution:
names(pick) <- c("b", "c", "c", "c", "d")
x <- names(pick[match(df$a, pick)])
x[is.na(x)] <- "a"
x
# [1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c"
It is even more straightforward if you include an NA in your "pick" object.
pick <- c(NA, 0, 1, 2, 3, 10)
names(pick) <- c("a", "b", "c", "c", "c", "d")
names(pick[match(df$a, pick, nomatch = 1)])
# [1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c"
If you use this second alternative, note that nomatch takes an integer value of the position of what you're matching agains. Here, nomatch maps to "NA" which is in the first position in your "pick" vector. If the "NA" were in the last position, you would enter it as nomatch = 6 instead.
You can also use ifelse function.
with(df,ifelse(a==0,"b",ifelse(a %in% c(1,2,3),"c",ifelse(a==10,"d","a"))))
[1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c"

Resources