Non consecutive combinations of array elements in R - r

I want to generate all the possible combinations of nonadjacent elements in an array.
For example:
array_a <- c("A","B","C")
possible combinations would be : AC and CA
How can I implement this in R?

If nonadjacent elements are defined as elements with distance greater than one in absolute values, then one option could be:
mat <- which(as.matrix(dist(seq_along(array_a))) > 1, arr.ind = TRUE)
paste0(array_a[mat[, 1]], array_a[mat[, 2]])
[1] "CA" "DA" "EA" "DB" "EB" "AC" "EC" "AD" "BD" "AE" "BE" "CE"
Sample data:
array_a <- c("A", "B", "C", "D", "E")

We can use outer
c(outer(array_a, array_a, FUN = paste, sep=""))
Or if we want to omit alternate elements
outer(array_a[c(TRUE, FALSE)], array_a[c(TRUE, FALSE)], FUN = paste, sep="")
Or using crossing
library(dplyr)
library(tidyr)
crossing(v1 = array_a[c(TRUE, FALSE)],
v2 = array_a[c(TRUE, FALSE)]) %>%
filter(v1 != v2) %>%
unite(v1, v1, v2, sep="") %>%
pull(v1)
#[1] "AC" "CA"
NOTE: It is not clear about the assumptions for non-adjacent elements. We answered it based on a different assumption.

Another base R option using expand.grid + subset
inds <- subset(expand.grid(seq_along(array_a), seq_along(array_a)), abs(Var1 - Var2) > 1)
paste0(array_a[inds$Var1],array_a[inds$Var2])

The #tmfmnk solution is so cool. Still I want to add sth from me.
I use the arrangements package for permutations without repetition.
array_a <- c("A", "B", "C", "D", "E")
#vec to rm from permutations neighbors
vec = paste0(array_a[-1], head(array_a, -1))
cc = apply(arrangements::permutations(array_a, 2, replace = F), 1, function(x) paste0(x, collapse = ""))
> setdiff(cc, c(vec, stringi::stri_reverse(vec)))
[1] "AC" "AD" "AE" "BD" "BE" "CA" "CE" "DA" "DB" "EA" "EB" "EC"

Related

Replace the same occurances by multiple strings

Let's say I have a vector with multiple strings:
a<- c('a?cd','ab?cd','abc?')
How can I replace the first "?" by b the second "?" by c and the third "?" by d, in order to produce a result like this:
'abcd','abcd','abcd'
Improving the topic with the answer from G. Grothendieck!
In case we have two symbols in the same element that should be replaced by different patterns:
a <- c('espa?a','per? an?n','peque?os')
L <- c('N','U','O','N');
fmt <- gsub("[?]", "%s", a)
g <- cumsum(sequence(nchar(gsub("[^?]", "", a)))==1)
mapply(function(fmt, x) do.call("sprintf", as.list(c(fmt, x))), fmt, split( L, g), USE.NAMES = FALSE)
Apply chartr across each component as follows. Note that head(...) is c("b", "c", "d") . No packages are used.
a<- c('a?cd','ab?cd','abc?') # test input
mapply(chartr, "?", head(letters[-1], length(a)), a, USE.NAMES = FALSE)
## [1] "abcd" "abccd" "abcd"
If what you meant was to check if any elements of "a", "b", "c", "d" are missing from each component and if so then replace ? with that missing element then first create a list of L of replacements and then apply sub to each component with it. We assume that there are 0 or 1 missing elements from each component and 0 or 1 instances of ? in each component. Again, no packages are used.
L <- lapply(strsplit(a, ""), setdiff, x = letters[1:4])
L[lengths(L) == 0] <- ""
mapply(`sub`, "[?]", L, a, USE.NAMES = FALSE)
## [1] "abcd" "abcd" "abcd"
stringr::str_replace() has vectorized replacement so you can do:
library(stringr)
str_replace(a, "\\?", letters[seq_along(a) + 1])
[1] "abcd" "abccd" "abcd"
You can use str_replace from stringrpackage
library(stringr)
a<- c('a?cd','ab?cd','abc?')
str_replace(a,"[?]",letters[2:4])
[1] "abcd" "abccd" "abcd"
or
str_replace(a, "[?]", c("b", "c", "d"))
[1] "abcd" "abccd" "abcd"

R use mapply on nested list

Using base R, I'd like to use the mapply function on a nested list. For example, in the code below, I'm trying to remove the letter "a" from each element of a nested list. I'd like to replace the last two lines with just a single line of code.
mylist <- list(
list(c("a", "b", "c"), c("d", "e", "f")),
list(c("a", "v", "w"), c("x", "y"), c("c", "b", "a"))
)
mylist
not_a <- lapply(mylist, lapply, `!=`, "a")
not_a
mylist[[1]] <- mapply(`[`, mylist[[1]], not_a[[1]], SIMPLIFY = FALSE)
mylist[[2]] <- mapply(`[`, mylist[[2]], not_a[[2]], SIMPLIFY = FALSE)
One option could be:
rapply(mylist, how = "replace", function(x) x[x != "a"])
[[1]]
[[1]][[1]]
[1] "b" "c"
[[1]][[2]]
[1] "d" "e" "f"
[[2]]
[[2]][[1]]
[1] "v" "w"
[[2]][[2]]
[1] "x" "y"
[[2]][[3]]
[1] "c" "b"
Or using map2
library(purrr)
map2(mylist, not_a, ~ map2(.x, .y, `[`))
Or using map_depth (if the OP is interested only in the final outcome)
map_depth(mylist, 2, ~ .x[.x != 'a'])
#[[1]]
#[[1]][[1]]
#[1] "b" "c"
#[[1]][[2]]
#[1] "d" "e" "f"
#[[2]]
#[[2]][[1]]
#[1] "v" "w"
#[[2]][[2]]
#[1] "x" "y"
#[[2]][[3]]
#[1] "c" "b"
Or more compactly
map_depth(mylist, 2, setdiff, 'a')
A double loop Map/mapply will do what the question asks for.
Map(function(i) mapply(`[`, mylist[[i]], not_a[[i]], SIMPLIFY = FALSE), seq_along(mylist))
Simpler:
Map(function(x, y) Map(`[`, x, y), mylist, not_a)

How to remove duplicate elements from two lists (pairwise)?

I have two very large lists (13000) elements. I would like to remove the duplicates pair-wise, i.e. remove object i in both lists if we find the same as object j.
The function unique() works very well for a single list, but does not work pairwise.
a = matrix(c(50,70,45,89), ncol = 2)
b = matrix(c(45,86), ncol = 2)
c = matrix(c(20,35), ncol = 2)
df1 = list(a,b,c)
df2 = list(a,b,a)
df3 = cbind(df1,df2)
v = unique(df3, incomparables = FALSE)
In the end, the expected result would be df1 = list(c) and df2 = list(a). Do you have a good approach for this? Thank you a lot!
If you only have single element for each component of your list, then you can:
df1 <- list("a", "b", "c")
df2 <- list("a", "b", "a")
comp <- unlist(df1) != unlist(df2)
df1[comp]
[[1]]
[1] "c"
df2[comp]
[[1]]
[1] "a"
is that what you were looking for?
a more generic (whatever you'd have in your lists) solution using purrr would be:
comp2 <- !purrr::map2_lgl(df1, df2, identical)
df1[comp2]
[[1]]
[1] "c"
df2[comp2]
[[1]]
[1] "a"
You can try
Filter(length, Map(function(x, y) x[x != y], df1, df2))
#[[1]]
#[1] "c"
Filter(length, Map(function(x, y) x[x != y], df2, df1))
#[[1]]
#[1] "a"

How to concatenete strings after str_split

Given this data frame
column_1 column_2
A w,x
B z
C q,r,s
My desired output would be
"Aw", "Ax", "Bz", "Cq", "Cr", "Cs"
I've tried
paste0(df$column_1, strsplit(df$column_2, ","))
But the output is
"Ac(\"w\", \"x\")" "Bz" "Cc(\"q\", \"r\", \"s\")"
We can split column_2 on "," and paste them with column_1 using mapply
unlist(mapply(paste0, df$column_1,strsplit(df$column_2, ",")))
#[1] "Aw" "Ax" "Bz" "Cq" "Cr" "Cs"
We can replicate the 'column_1' by the lengths of list output from strsplit and then do the paste
lst1 <- strsplit(df$column_2, ",")
paste0(rep(df$column_1, lengths(lst1)), unlist(lst1))
#[1] "Aw" "Ax" "Bz" "Cq" "Cr" "Cs"
NOTE: The above is a vectorized approach as we are not looping through the list
Or use stack to create a two column data.frame from list and then paste
do.call(paste0, stack(setNames(lst1, df$column_1))[2:1])
#[1] "Aw" "Ax" "Bz" "Cq" "Cr" "Cs"
stacking to a two column data.frame approach may be a bit less efficient compared to the first approach
Or with tidyverse, split the 'column_2' to long format with separate_rows, then unite the two columns and pull it to vector
library(tidyverse)
df %>%
separate_rows(column_2) %>%
unite(newcol, column_1, column_2, sep="") %>%
pull(newcol)
#[1] "Aw" "Ax" "Bz" "Cq" "Cr" "Cs"
The issue in the OP's approach is based on the fact that the strsplit output is a list of vectors. We need a function to loop over the list (lapply/sapply/vapply) or unlist the list into a vector while replicating the 'column_1' (to make the length during pasteing)
data
df <- structure(list(column_1 = c("A", "B", "C"), column_2 = c("w,x",
"z", "q,r,s")), class = "data.frame", row.names = c(NA, -3L))
This can also be achieved using below code. Although not very idiomatic
df <- data.frame(column_1 = c("A", "B", "C"), column_2 = c("w,x", "z", "q,r,s"))
l_vals <- strsplit(as.character(df$column_2), split = ",", perl =TRUE)
l_append = list()
for(i in seq_along(l_vals)){
l_append <- c(l_append,paste0(df$column_1[i], l_vals[[i]]))
}
unlist(l_append)

Disambiguate non-unique elements in a character vector

Given a vector of non-unique patient initials:
init = c("AA", "AB", "AB", "AB", "AC")
Looking for disambiguation as follows:
init1 = c("AA", "AB01", "AB02", "AB03", "AC")
i.e. unique initials should be left unchanged, non-unique are disambiguated by adding two-digit numbers.
Use the indicated function with ave:
uniquify <- function(x) if (length(x) == 1) x else sprintf("%s%02d", x, seq_along(x))
ave(init, init, FUN = uniquify)
## [1] "AA" "AB01" "AB02" "AB03" "AC"
If the basic requirement is just to ensure unique output then make.unique(x) or make.unique(x, sep = "0") as discussed by another answer and a comment are concise but if the requirement is that the output be exactly as in the question then they do not give the same result. If there are 10 or more duplicates the output of those answers vary even more; however, the solution here does give the same answer. Here is a further example illustrating 10 or more duplicates.
xx <- rep(c("A", "B", "C"), c(1, 10, 2))
ave(xx, xx, FUN = uniquify)
## [1] "A" "B01" "B02" "B03" "B04" "B05" "B06" "B07" "B08" "B09" "B10" "C01" "C02"
The make.unique solution could be rescued like this:

Resources