This question is very similar to this one and this one but it combines elements of both in a way that I can't figure out on my own.
I have the following list.
original_groups <- list(group_1 = as.character(1:6), group_2 = as.character(7:12), group_3 = as.character(13:20))
I want to create new groups based on these original groups. There is a constraint - each new group must contain an equal number of items from each original group. Furthermore, items cannot be used more than once. For example, if we take one item from each original group, we may get the following new groups.
Number_of_Items_From_Each_Original_Group_to_End_up_in_Each_New_Group <- 1
Number_of_New_Groups <- 3
# option 1
new_groups <- list(group_1 = as.character(c(1, 7, 13)), group_2 = as.character(c(2, 8, 14)), group_3 = as.character(c(3, 9, 15)))
# option 2
new_groups <- list(group_1 = as.character(c(1, 7, 13)), group_2 = as.character(c(2, 8, 14)), group_3 = as.character(c(3, 9, 16)))
# option 3
new_groups <- list(group_1 = as.character(c(1, 8, 13)), group_2 = as.character(c(2, 7, 14)), group_3 = as.character(c(3, 9, 15)))
There are three things that make what I'm hoping to do really tricky. First, I want to generate all possible combinations since this operation is part of a larger function. Second, I want to have the option to have multiple items from each original group end up in each new group. Third, I also want to have the option to choose how many new groups there will be. Here is another example.
Number_of_Items_From_Each_Original_Group_to_End_up_in_Each_New_Group <- 2
Number_of_New_Groups <- 3
# option 1
new_groups <- list(group_1 = as.character(c(1, 2, 7, 8, 13, 14)), group_2 = as.character(c(3, 4, 9, 10, 15, 16)), group_3 = as.character(c(5, 6, 11, 12, 17, 18)))
# option 2
new_groups <- list(group_1 = as.character(c(1, 2, 7, 8, 13, 14)), group_2 = as.character(c(3, 4, 9, 10, 15, 16)), group_3 = as.character(c(5, 6, 11, 12, 17, 19)))
# option 3
new_groups <- list(group_1 = as.character(c(1, 3, 7, 8, 13, 14)), group_2 = as.character(c(2, 4, 9, 10, 15, 16)), group_3 = as.character(c(5, 6, 11, 12, 17, 18)))
If each original group contained at least 9 items, I could even make new groups that each contain 3 items from each original group. Alternatively, if each original group contained at least 9 items, I could also increase the number of new groups to 4 if only 2 items from each original group end up in each new group.
Note that the original groups don't all need to contain the same amount of items for this process to work - the third original group contains more items than the other two original groups.
Also, note that item order doesn't matter within new groups. In other words, new_groups <- list(group_1 = as.character(c(1, 2, 7, 8, 13, 14)), group_2 = as.character(c(3, 4, 9, 10, 15, 16)), group_3 = as.character(c(5, 6, 11, 12, 17, 18))) is the same as new_groups <- list(group_1 = as.character(c(2, 1, 8, 7, 14, 13)), group_2 = as.character(c(3, 4, 9, 10, 15, 16)), group_3 = as.character(c(5, 6, 11, 12, 17, 18))), so in my final output, I would only want one of these options reported.
Finally, note that the number of original groups won’t always equal the number of new groups - they just happen to in this example.
All solutions are welcome but I'd be especially curious to see one using only base functions.
Thank you!
I suggest this solution: it does not use any other library than base.
Define a permutations function, to compute all possibile combinations of elements of a vector (or list) vec taken sublen elements at a time (likewise the combn function)
permutations <- function(vec, sublen, prev_vec=NULL){
out_list <- list()
if(sublen==1){
for(v in vec){
out_list <- append(out_list,list(append(prev_vec,list(v))))
}
} else {
for (i in 1:(length(vec)-sublen+1)){
v <- vec[1]
prev_vec0 <- c(prev_vec,vec[1])
vec <- vec[2:length(vec)]
perm_list <- permutations(
vec=vec,
sublen=sublen-1,
prev_vec=prev_vec0
)
out_list <- append(out_list,perm_list)
}
}
return(out_list)
}
Define a find_matrix function that unlists matrices from deeply nested lists (source)
find_matrix <- function(x) {
if (is.matrix(x))
return(list(x))
if (!is.list(x))
return(NULL)
unlist(lapply(x, find_matrix), FALSE)
}
Define a compatible_rows function, that extract from a dataframe a subset of rows which can be used to create the other output vectors, given an output vector
compatible_rows <- function(df,row_value){
row_ids <- c()
if(is.null(nrow(df))){
return(NULL)
} else {
for (row_id in 1:nrow(df)){
row_ids <- c(row_ids,!any(row_value %in% df[row_id,]))
}
return(df[which(row_ids),])
}
}
Create a new_groups_list function, that computes all possible output matrices
new_groups_list <- function(df, prev_df=NULL, lvl=-1, verbose=F){
lvl <- lvl+1
results_list <- list()
if(is.null(nrow(df))){
if(verbose==T) cat(paste0("--BRANCH END (BEGIN lvl ",lvl,")--\n"))
prev_df0 <- rbind(prev_df,df)
rownames(prev_df0) <- NULL
if(verbose==T) cat("returned\n")
if(verbose==T) print(prev_df0)
if(verbose==T) cat("\n")
return(prev_df0)
}
if(nrow(df)==0){
if(verbose==T) cat(paste0("--BRANCH END (BEGIN lvl ",lvl,")--\n"))
prev_df0 <- rbind(prev_df,df)
rownames(prev_df0) <- NULL
if(verbose==T) cat("returned\n")
if(verbose==T) print(prev_df0)
if(verbose==T) cat("\n")
return(prev_df0)
}
for(row_id in 1:nrow(df)){
if(verbose==T) cat(paste("-- lvl",lvl,"cycle",row_id,"--\n"))
if(verbose==T) cat("initial results list\n")
if(verbose==T) print(results_list)
if(verbose==T) cat("\n")
if(verbose==T) cat("df in\n")
if(verbose==T) assign("last_df",df,envir = .GlobalEnv)
if(verbose==T) print(df)
if(verbose==T) cat("\n")
if(is.null(nrow(df))){
prev_df0 <- rbind(prev_df,df)
rownames(prev_df0) <- NULL
if(verbose==T) cat(paste0("--BRANCH END (MID lvl ",lvl,")--\n"))
if(verbose==T) cat("returned\n")
results_list <- append(results_list,list(prev_df0))
if(verbose==T) print(results_list)
if(verbose==T) cat("\n")
return(results_list)
}
considered_row <- df[1,]
if(verbose==T) assign("last_considered_row",considered_row,envir = .GlobalEnv)
if(verbose==T) cat("considered rows\n")
if(verbose==T) print(considered_row)
if(verbose==T) cat("\n")
df <- df[2:nrow(df),]
if(verbose==T) assign("last_df",df,envir = .GlobalEnv)
if(verbose==T) cat("df without considered rows\n")
if(verbose==T) print(df)
if(verbose==T) cat("\n")
prev_df0 <- rbind(prev_df,considered_row)
rownames(prev_df0) <- NULL
if(verbose==T) assign("last_prev0",prev_df0,envir = .GlobalEnv)
if(verbose==T) cat("collected considered rows\n")
if(verbose==T) print(prev_df0)
if(verbose==T) cat("\n")
comp_df <- compatible_rows(df,considered_row)
if(verbose==T) assign("last_comp_df",comp_df,envir = .GlobalEnv)
if(verbose==T) cat("compatible rows in df\n")
if(verbose==T) print(comp_df)
if(verbose==T) cat("\n")
if(verbose==T) cat(paste(">>> GOING TO LVL",lvl+1,"\n\n"))
new_rows <- new_groups_list(
comp_df,
prev_df=prev_df0,
lvl=lvl,
verbose=verbose
)
if(verbose==T) cat(paste0("--ROOT (lvl ",lvl,")--\n"))
if(verbose==T) cat("result received from branch\n")
if(verbose==T) print(new_rows)
if(verbose==T) cat("\n")
results_list <- append(results_list,list(new_rows))
if(verbose==T) cat("results list\n")
if(verbose==T) print(results_list)
if(verbose==T) cat("\n")
}
return(results_list)
}
Create create_new_groups, which wraps-up all the other functions, and outputs the whole list of possible solutions
create_new_groups <- function(original_groups, max_output = NULL){
min_len_original_groups = min(lengths(original_groups))
num_original_groups = length(original_groups)
max_len_subgroup <- floor(min_len_original_groups/2)
if(min_len_original_groups<2){
return("Not possible to populate new groups: at least one original group has less than 2 elements")
}
NewGroups_subLen_len_num <- list()
for (len_subgroup in 1:max_len_subgroup){
new_group_params <- c(
len_subgroup,
len_subgroup*num_original_groups,
floor(min_len_original_groups/len_subgroup)
)
NewGroups_subLen_len_num[[len_subgroup]] <- new_group_params
}
out_list <- list()
ind <- 1
for (e in 1:length(NewGroups_subLen_len_num)){
NewGroup_subLen_len_num <- NewGroups_subLen_len_num[[e]]
elem_list <- list()
ind <- 1
# print(ind)
for (o in 1:length(original_groups)){
original_group <- original_groups[[o]]
elem_list[[paste("group",ind)]] <- permutations(original_group,NewGroup_subLen_len_num[1])
ind <- ind+1
}
out_list[[paste(c("subLen","len","numGroups"), NewGroup_subLen_len_num, collapse = " ")]] <- elem_list
}
results_list <- list()
config_test <<- NewGroups_subLen_len_num
for (config_id in 1:length(NewGroups_subLen_len_num)){
config <- NewGroups_subLen_len_num[[config_id]]
perm_grid <- expand.grid(out_list[[config_id]])
perm_grid <- split(perm_grid,1:nrow(perm_grid))
perm_grid <- lapply(perm_grid,unlist)
perm_grid <- lapply(perm_grid,as.character)
perm_grid <- do.call(rbind, perm_grid)
new_groups <- new_groups_list(perm_grid,verbose = F)
new_groups <- find_matrix(new_groups)
# config_test <<- config
new_groups <- new_groups[which(sapply(new_groups, nrow) >= config[3])]
# out_test <<- new_groups
results_list[[config_id]] <- new_groups
}
return(results_list)
}
Given a simple input, like
original_groups <- list(
group_1 = as.character(1:2),
group_2 = as.character(3:4),
group_3 = as.character(5:7)
)
The output of create_new_groups(original_groups) is
> create_new_groups_modified(original_groups)
[[1]]
[[1]][[1]]
[,1] [,2] [,3]
[1,] "1" "3" "5"
[2,] "2" "4" "6"
[[1]][[2]]
[,1] [,2] [,3]
[1,] "1" "3" "5"
[2,] "2" "4" "7"
[[1]][[3]]
[,1] [,2] [,3]
[1,] "2" "3" "5"
[2,] "1" "4" "6"
[[1]][[4]]
[,1] [,2] [,3]
[1,] "2" "3" "5"
[2,] "1" "4" "7"
[[1]][[5]]
[,1] [,2] [,3]
[1,] "1" "4" "5"
[2,] "2" "3" "6"
[[1]][[6]]
[,1] [,2] [,3]
[1,] "1" "4" "5"
[2,] "2" "3" "7"
[[1]][[7]]
[,1] [,2] [,3]
[1,] "2" "4" "5"
[2,] "1" "3" "6"
[[1]][[8]]
[,1] [,2] [,3]
[1,] "2" "4" "5"
[2,] "1" "3" "7"
[[1]][[9]]
[,1] [,2] [,3]
[1,] "1" "3" "6"
[2,] "2" "4" "7"
[[1]][[10]]
[,1] [,2] [,3]
[1,] "2" "3" "6"
[2,] "1" "4" "7"
[[1]][[11]]
[,1] [,2] [,3]
[1,] "1" "4" "6"
[2,] "2" "3" "7"
[[1]][[12]]
[,1] [,2] [,3]
[1,] "2" "4" "6"
[2,] "1" "3" "7"
Moreover, the create_new_groups function also creates a global variable config_test where all possible configurations, for a given list of groups (i.e., original_groups), are stored. For example, for the previous problem, config_test is equal to
> config_test
[[1]]
[1] 1 3 2
So, for this problem only one output configuration is possible, having this structure:
the number of elements taken from each original group and used in each output group is 1
the length of the output groups is 6
the number of output groups (in each possible combination) is 2
Given a slightly more complex example
original_groups <- list(
group_1 = as.character(1:4),
group_2 = as.character(5:8),
group_3 = as.character(9:13)
)
config_test would be equal to
> config_test
[[1]]
[1] 1 3 4
[[2]]
[1] 2 6 2
I made some tests, this method should work for any number of groups, of any length, and the output should always be composed of not-duplicated matrices.
I'm sorry if the explanation is short, if I have time in the following days I'll try to add some notes.
EDIT
A simple way to output only the configurations characterized by a specific number of elements from the original groups is to change the create_new_groups as follows
create_new_groups_modified <- function(original_groups, max_output = NULL, elements_from_original = NULL){
min_len_original_groups = min(lengths(original_groups))
num_original_groups = length(original_groups)
max_len_subgroup <- floor(min_len_original_groups/2)
if(min_len_original_groups<2){
stop("Not possible to populate new groups: at least one original group has less than 2 elements")
}
NewGroups_subLen_len_num <- list()
for (len_subgroup in 1:max_len_subgroup){
new_group_params <- c(
len_subgroup,
len_subgroup*num_original_groups,
floor(min_len_original_groups/len_subgroup)
)
NewGroups_subLen_len_num[[len_subgroup]] <- new_group_params
}
out_list <- list()
ind <- 1
for (e in 1:length(NewGroups_subLen_len_num)){
NewGroup_subLen_len_num <- NewGroups_subLen_len_num[[e]]
elem_list <- list()
ind <- 1
# print(ind)
for (o in 1:length(original_groups)){
original_group <- original_groups[[o]]
elem_list[[paste("group",ind)]] <- permutations(original_group,NewGroup_subLen_len_num[1])
ind <- ind+1
}
out_list[[paste(c("subLen","len","numGroups"), NewGroup_subLen_len_num, collapse = " ")]] <- elem_list
}
results_list <- list()
config_test <<- NewGroups_subLen_len_num
# if `elements_from_original` is not set, output all possible combinations
if(is.null(elements_from_original)){
for (config_id in 1:length(NewGroups_subLen_len_num)){
config <- NewGroups_subLen_len_num[[config_id]]
perm_grid <- expand.grid(out_list[[config_id]])
perm_grid <- split(perm_grid,1:nrow(perm_grid))
perm_grid <- lapply(perm_grid,unlist)
perm_grid <- lapply(perm_grid,as.character)
perm_grid <- do.call(rbind, perm_grid)
new_groups <- new_groups_list(perm_grid,verbose = F)
new_groups <- find_matrix(new_groups)
# config_test <<- config
new_groups <- new_groups[which(sapply(new_groups, nrow) >= config[3])]
# out_test <<- new_groups
results_list[[config_id]] <- new_groups
}
} else {
# if `elements_from_original` is set, check if this is a valid configuration, then output only the matrix having this configuration
config_id <- which(sapply(NewGroups_subLen_len_num,function(x) x[1]==elements_from_original))
if (length(config_id)!=0){
config <- NewGroups_subLen_len_num[[config_id]]
perm_grid <- expand.grid(out_list[[config_id]])
perm_grid <- split(perm_grid,1:nrow(perm_grid))
perm_grid <- lapply(perm_grid,unlist)
perm_grid <- lapply(perm_grid,as.character)
perm_grid <- do.call(rbind, perm_grid)
new_groups <- new_groups_list(perm_grid,verbose = F)
new_groups <- find_matrix(new_groups)
new_groups <- new_groups[which(sapply(new_groups, nrow) >= config[3])]
results_list[[1]] <- new_groups
} else {
stop("value of elements_from_original not available: check config_test to see available configurations")
}
}
return(results_list)
}
The elements_from_original parameter of the function allows to set the number elements from the original groups to consider, and, if applicable, the output will include only matrices following that configuration.
EDIT 2
To output the matrices composed by a specific number of groups
Write a new function select_matrices_by_number_output_groups, that outputs only the matrices with n_output_groups rows
select_matrices_by_number_output_groups <- function(l,n_output_groups){
# Filter out matrices having less rows than `n_output_groups`
out_l <- l[which(
sapply(
l,
# function(x) check_matrix_by_number_output_groups(x,n_output_groups)
function(mtr){
if(nrow(mtr)<n_output_groups) return(F)
else return(T)
}
)
)]
# Cut-off rows from matrices having more rows than `n_output_groups`
out_l <- lapply(
out_l,
function(x) head(x,n_output_groups)
)
# Keep only unique elements (i.e., matrices)
out_l <- unique(out_l)
return(out_l)
}
The update create_new_groups so that it includes the select_matrices_by_number_output_groups function
create_new_groups_modified_2 <- function(original_groups, max_output = NULL, elements_from_original = NULL, n_output_groups = NULL){
min_len_original_groups = min(lengths(original_groups))
num_original_groups = length(original_groups)
max_len_subgroup <- floor(min_len_original_groups/2)
if(min_len_original_groups<2){
stop("Not possible to populate new groups: at least one original group has less than 2 elements")
}
NewGroups_subLen_len_num <- list()
for (len_subgroup in 1:max_len_subgroup){
new_group_params <- c(
len_subgroup,
len_subgroup*num_original_groups,
floor(min_len_original_groups/len_subgroup)
)
NewGroups_subLen_len_num[[len_subgroup]] <- new_group_params
}
out_list <- list()
ind <- 1
for (e in 1:length(NewGroups_subLen_len_num)){
NewGroup_subLen_len_num <- NewGroups_subLen_len_num[[e]]
elem_list <- list()
ind <- 1
# print(ind)
for (o in 1:length(original_groups)){
original_group <- original_groups[[o]]
elem_list[[paste("group",ind)]] <- permutations(original_group,NewGroup_subLen_len_num[1])
ind <- ind+1
}
out_list[[paste(c("subLen","len","numGroups"), NewGroup_subLen_len_num, collapse = " ")]] <- elem_list
}
results_list <- list()
config_test <<- NewGroups_subLen_len_num
# if `elements_from_original` is not set, output all possible combinations
if(is.null(elements_from_original)){
for (config_id in 1:length(NewGroups_subLen_len_num)){
config <- NewGroups_subLen_len_num[[config_id]]
perm_grid <- expand.grid(out_list[[config_id]])
perm_grid <- split(perm_grid,1:nrow(perm_grid))
perm_grid <- lapply(perm_grid,unlist)
perm_grid <- lapply(perm_grid,as.character)
perm_grid <- do.call(rbind, perm_grid)
new_groups <- new_groups_list(perm_grid,verbose = F)
new_groups <- find_matrix(new_groups)
new_groups <- new_groups[which(sapply(new_groups, nrow) >= config[3])]
results_list[[config_id]] <- new_groups
}
} else {
# if `elements_from_original` is set, check if this is a valid configuration, then output only the matrix having this configuration
config_id <- which(sapply(NewGroups_subLen_len_num,function(x) x[1]==elements_from_original))
if (length(config_id)!=0){
config <- NewGroups_subLen_len_num[[config_id]]
perm_grid <- expand.grid(out_list[[config_id]])
perm_grid <- split(perm_grid,1:nrow(perm_grid))
perm_grid <- lapply(perm_grid,unlist)
perm_grid <- lapply(perm_grid,as.character)
perm_grid <- do.call(rbind, perm_grid)
new_groups <- new_groups_list(perm_grid,verbose = F)
new_groups <- find_matrix(new_groups)
new_groups <- lapply(
new_groups,
function(x) {
dimnames(x) <- NULL
return(x)
}
)
if(is.null(n_output_groups)){
new_groups <- new_groups[which(sapply(new_groups, nrow) == config[3])]
} else if (n_output_groups > config[3]){
stop("value n_output_groups higher than max number of new groups for this configuration: check config_test to see available configurations")
} else {
new_groups <- select_matrices_by_number_output_groups(new_groups,n_output_groups)
}
# results_list[[1]] <- new_groups
results_list <- new_groups
} else {
stop("value of elements_from_original not available: check config_test to see available configurations")
}
}
return(results_list)
}
As mentioned in the comment, there are potentially huge numbers of combinations involved here. However, one approach that works (assuming you have enough time/memory) is as follows. This example is for just the first two elements of your original_groups list, allowing two elements from each group. It would be straightforward to generalise the final map to arbitrary numbers of groups, but this is just to illustrate the principle.
The first map sets up a vector of group indices, padded out with NAs to the length of the group (6) (i.e in this case c(1,2,1,2,NA,NA)), and finds all unique permutations of it. cross then combines each option for the first group with each option for the second, and the final map uses these indices to separate the elements into the two groups.
library(combinat) #for permn function
library(tidyverse) #purrr and dplyr
original_groups <- list(group_1 = as.character(1:6), group_2 = as.character(7:12))
no_items <- 2
no_groups <- length(original_groups) #i.e. 2 in this case
output <- map(original_groups,
~unique(permn(`length<-`(rep(seq_len(no_groups),
no_items),
length(.))))) %>%
cross() %>%
map(~list(c(original_groups$group_1[which(.$group_1 == 1)],
original_groups$group_2[which(.$group_2 == 1)]),
c(original_groups$group_1[which(.$group_1 == 2)],
original_groups$group_2[which(.$group_2 == 2)])))
head(output) #full output has 8100 elements
[[1]]
[[1]][[1]]
[1] "1" "3" "7" "9"
[[1]][[2]]
[1] "2" "4" "8" "10"
[[2]]
[[2]][[1]]
[1] "1" "3" "7" "9"
[[2]][[2]]
[1] "2" "5" "8" "10"
[[3]]
[[3]][[1]]
[1] "1" "4" "7" "9"
[[3]][[2]]
[1] "2" "5" "8" "10"
[[4]]
[[4]][[1]]
[1] "1" "4" "7" "9"
[[4]][[2]]
[1] "3" "5" "8" "10"
[[5]]
[[5]][[1]]
[1] "2" "4" "7" "9"
[[5]][[2]]
[1] "3" "5" "8" "10"
[[6]]
[[6]][[1]]
[1] "2" "4" "7" "9"
[[6]][[2]]
[1] "3" "6" "8" "10"
This is the job of expand.grid + combn: will be showing only the first 5 rows:
n <- 1
expand.grid(lapply(original_groups, combn, n, simplify = FALSE))
group_1 group_2 group_3
1 1 7 13
2 2 7 13
3 3 7 13
4 4 7 13
5 5 7 13
when n = 2
n <- 2
expand.grid(lapply(original_groups, combn, n, simplify = FALSE))
group_1 group_2 group_3
1 1, 2 7, 8 13, 14
2 1, 3 7, 8 13, 14
3 1, 4 7, 8 13, 14
4 1, 5 7, 8 13, 14
5 1, 6 7, 8 13, 14
You can write a simple function:
generate_all <- function(lst, n){
expand.grid(lapply(lst, combn, n, simplify = FALSE))
}
head(generate_all(original_groups, 3))
group_1 group_2 group_3
1 1, 2, 3 7, 8, 9 13, 14, 15
2 1, 2, 4 7, 8, 9 13, 14, 15
3 1, 2, 5 7, 8, 9 13, 14, 15
4 1, 2, 6 7, 8, 9 13, 14, 15
5 1, 3, 4 7, 8, 9 13, 14, 15
head(generate_all(original_groups, 4))
group_1 group_2 group_3
1 1, 2, 3, 4 7, 8, 9, 10 13, 14, 15, 16
2 1, 2, 3, 5 7, 8, 9, 10 13, 14, 15, 16
3 1, 2, 3, 6 7, 8, 9, 10 13, 14, 15, 16
4 1, 2, 4, 5 7, 8, 9, 10 13, 14, 15, 16
5 1, 2, 4, 6 7, 8, 9, 10 13, 14, 15, 16
6 1, 2, 5, 6 7, 8, 9, 10 13, 14, 15, 16
The prototype function below works for any number of groups from which some number ("numobs" in the function) of observations have been uniquely drawn. I have added an example with four groups instead of 3 to illustrate.
The function was written in R version 4.2.1 and data.table version 1.14.2 on a windows operating system. The code is rather sophisticated; changing the position of a single comma, or a single compatibility issue, might render the function inoperable.
In the example given by the poster, there are 15 ways of drawing two distinct members from the first two groups (length 6) and 28 ways of drawing two distinct members from the third group (length 8). There should then be 15 * 15 * 28 = 6300 possible combinations.
I have added two other examples to show that this function works for any given number of groups with k objects taken from each group. However, the code speed is affected as the number of groups and objects increases, and will likely become an issue for large numbers of groups or objects.
Suggested Citation:
Harkness, Jeffrey (2022). Customized R combination function. Posted 7/14/2022 on stackoverflow.com
##original example from poster
(original_groups <- list(group_1 = as.character(1:6),
group_2 = as.character(7:12), group_3 = as.character(13:20)))
$group_1
[1] "1" "2" "3" "4" "5" "6"
$group_2
[1] "7" "8" "9" "10" "11" "12"
$group_3
[1] "13" "14" "15" "16" "17" "18" "19" "20"
#testing 4 groups instead of 3
(original_groups2 <- list(group_1 = as.character(1:6),
group_2 = as.character(7:12), group_3 = as.character(13:20), group_4 = as.character(21:24)))
$group_1
[1] "1" "2" "3" "4" "5" "6"
$group_2
[1] "7" "8" "9" "10" "11" "12"
$group_3
[1] "13" "14" "15" "16" "17" "18" "19" "20"
$group_4
[1] "21" "22" "23" "24"
#testing 4 different groups
(original_groups3 <- list(group_1 = as.character(1:4),
group_2 = as.character(5:9), group_3 = as.character(10:16), group_4 = as.character(17:22)))
$group_1
[1] "1" "2" "3" "4"
$group_2
[1] "5" "6" "7" "8" "9"
$group_3
[1] "10" "11" "12" "13" "14" "15" "16"
$group_4
[1] "17" "18" "19" "20" "21" "22"
require(data.table)
# Loading required package: data.table
# data.table 1.14.2 using 4 threads (see ?getDTthreads). Latest news: r-datatable.com
gc <- function(input_ob = original_groups, numgroups = length(input_ob), numobs = 2) {
ansdim <- NULL
for (k in 1:numgroups) {
rowcount <- 1
temp <- as.vector(input_ob[[k]]) # temporary vector for group k
newcombs <- data.frame(combn(temp, numobs)) # data frame of combinations for group k
newcombs <- transpose(newcombs)
newname <- paste0("ansnew", k)
ansdim[k] <- dim(newcombs)[1]
assign(newname, newcombs)
}
### To hold final answer
nm <- matrix(data = 0, nrow = prod(ansdim), ncol = numobs * numgroups, byrow = T)
## All possible combinations of the first two groups
combine <- NULL
nc <- ansdim[1]
nc <- data.frame(CJ(1:nc, 1:ansdim[2])) # instructions for which row numbers to combine
ntemp <- matrix(data = 0, nrow = ansdim[1] * ansdim[2], ncol = numobs * 2, byrow = T)
for (m in 1:dim(ntemp)[1]) {
newrow <- cbind(ansnew1[nc[m, 1], ], ansnew2[nc[m, 2], ])
ntemp[m, ] <- as.matrix(newrow[1, ])
}
fcom <- ntemp ### all combinations of first two groups
## All possible combinations of all groups
for (n in 3:(numgroups)) {
nc <- ansdim[n]
nc <- data.frame(CJ(1:nrow(fcom), 1:ansdim[n])) # instructions for which row numbers to combine
ntemp <- matrix(data = 0, nrow = nrow(fcom) * ansdim[n], ncol = numobs * (n), byrow = T)
for (p in 1:dim(ntemp)[1]) {
frow <- fcom[nc[p, 1], ] ### First part of new row
srow <- as.character(as.vector(get(paste0("ansnew", n))[nc[p, 2], ])) ## second part of new row
newrow <- c(frow, srow)
ntemp[p, ] <- newrow
}
fcom <- ntemp
}
#
nm <- fcom
return(nm)
}
ans1 <- gc(input_ob = original_groups)
ans2 <- gc(input_ob = original_groups2)
ans3 <- gc(input_ob = original_groups3)
dim(ans1);head(ans1)
[1] 6300 6
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "1" "2" "7" "8" "13" "14"
[2,] "1" "2" "7" "8" "13" "15"
[3,] "1" "2" "7" "8" "13" "16"
[4,] "1" "2" "7" "8" "13" "17"
[5,] "1" "2" "7" "8" "13" "18"
[6,] "1" "2" "7" "8" "13" "19"
dim(ans2);head(ans2)
[1] 37800 8
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] "1" "2" "7" "8" "13" "14" "21" "22"
[2,] "1" "2" "7" "8" "13" "14" "21" "23"
[3,] "1" "2" "7" "8" "13" "14" "21" "24"
[4,] "1" "2" "7" "8" "13" "14" "22" "23"
[5,] "1" "2" "7" "8" "13" "14" "22" "24"
[6,] "1" "2" "7" "8" "13" "14" "23" "24"
dim(ans3);head(ans3)
[1] 18900 8
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] "1" "2" "5" "6" "10" "11" "17" "18"
[2,] "1" "2" "5" "6" "10" "11" "17" "19"
[3,] "1" "2" "5" "6" "10" "11" "17" "20"
[4,] "1" "2" "5" "6" "10" "11" "17" "21"
[5,] "1" "2" "5" "6" "10" "11" "17" "22"
[6,] "1" "2" "5" "6" "10" "11" "18" "19"
###Sample random rows from output to illustrate output is consistent
ans1[sample(nrow(ans1), size = 5, replace = F),]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "1" "3" "8" "12" "13" "17"
[2,] "3" "6" "8" "12" "19" "20"
[3,] "1" "3" "7" "11" "15" "19"
[4,] "1" "2" "9" "11" "17" "20"
[5,] "5" "6" "7" "8" "19" "20"
ans2[sample(nrow(ans2), size = 5, replace = F),]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] "2" "3" "8" "11" "13" "15" "22" "24"
[2,] "1" "2" "8" "9" "13" "19" "23" "24"
[3,] "1" "3" "11" "12" "13" "15" "21" "22"
[4,] "1" "4" "9" "10" "14" "16" "21" "22"
[5,] "2" "5" "7" "9" "15" "20" "22" "24"
ans3[sample(nrow(ans3), size = 5, replace = F),]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] "3" "4" "6" "7" "10" "15" "21" "22"
[2,] "1" "3" "5" "8" "12" "14" "18" "22"
[3,] "2" "4" "7" "8" "12" "16" "18" "19"
[4,] "2" "3" "5" "9" "11" "13" "17" "21"
[5,] "1" "2" "7" "8" "14" "16" "17" "18"
##shows that the output has no duplicated rows
ans1[duplicated(ans1),]
[,1] [,2] [,3] [,4] [,5] [,6]
In order to show convincingly that all the cases are represented in the output, I wrote a few lines of code that randomly sample two objects from the OP's example groups and finds the corresponding row in the output. This code can easily be looped over.
#The next lines sample from original group and find corresponding row in answer
samp1 = sample(original_groups$group_1,size = 2, replace = F)
samp2 = sample(original_groups$group_2,size = 2, replace = F)
samp3 = sample(original_groups$group_3,size = 2, replace = F)
(samprow = c(samp1, samp2, samp3))
[1] "6" "4" "8" "9" "20" "18"
colnames(ans1) = c("v1","v2","v3","v4","v5","v6")
ans5 = data.frame(ans1)
ans5[ans5$v1 %in% samprow & ans5$v2 %in% samprow &ans5$v3 %in% samprow &ans5$v4 %in% samprow &ans5$v5 %in% samprow &ans5$v6 %in% samprow,]
v1 v2 v3 v4 v5 v6
5627 4 6 8 9 18 20
7/16/2022 Update: Generating all combinations
The following approach was written for the case of 3 groups. The code would require modest modification to output the data in whatever form the user wants. It can be altered to accommodate other group sizes, periodically dump data into an output file to avoid size issues, etc., but I will not do so here for the sake of brevity. Rough calculations suggest that for the example data in "original_groups", it should work in about two hours or less for a reasonably fast laptop. It has been commented out for this reason.
Every unique sampled set (the output from gc()) is represented by a single integer in the code, so they don't have to be copied or repeated over and over. This helps to mitigate data size issues.
The code below writes a data.table where each row gives 3 integers that correspond to three unique sampled sets, like a set of instructions that can be used to construct the final result in whatever form (list, etc.) the user wants with additional code.
After running the code for about 15 minutes, the answer frame had collected 697,152 combinations after removing duplicates. The duplicate removal code is taken from here. All other code is original. Code and example output are shown below.
References:
Stack Overflow commentator Thomas (2014). Accessed 7/16/2022. URL: Deleting reversed duplicates with R
# inans = ans1 #object from gc() function with all combinations
# inans = data.table(inans)
# colvec = colnames(inans)
# inans$index = 1:nrow(inans)
#
# for(i in 1:nrow(inans)){ #For every possible sampled sequence
# sf = NULL;sf2 = NULL;ntemp = NULL; nin = NULL
# ntemp = as.character(as.vector(inans[i,..colvec]))
# nin = inans$index[i]
# sf = inans
# for(k in colvec){ #Remove rows that don't match first element
# sf = sf[!get(k) %in% ntemp]
# }
#
# ntemp2 = NULL; sin = NULL
# for(j in 1:nrow(sf)){ #Loop through all possibilities for element
# ntemp2 = as.character(as.vector(sf[j,..colvec]))
# sin = sf$index[j]
# sf2 = sf
# for(m in colvec){
# sf2 = sf2[!get(m) %in% ntemp2] #Remove rows that don't match second element
# }
# if(dim(sf2)[1] > 0){
# newframe = data.table(nin, sin, sf2$index) #Store row numbers for each combination
# if(i == 1){
# ansframe = newframe
# }else{
# ansframe = rbind(ansframe, newframe) #start process over for next sampled set
# }}}}
#
# ##Remove duplicate groups
# ##Code Source:
# ##https://stackoverflow.com/questions/22756392/deleting-reversed-duplicates-with-r
# ansframe = ansframe[!duplicated(apply(ansframe,1,function(x) paste(sort(x),collapse=''))),]
#
# #input data.table of all possible sample sets
# #6300 total here - made from gc() function above
# #index is an integer to represent each sequence
# head(inans)
# v1 v2 v3 v4 v5 v6 index
# 1: 1 2 7 8 13 14 1
# 2: 1 2 7 8 13 15 2
# 3: 1 2 7 8 13 16 3
# 4: 1 2 7 8 13 17 4
# 5: 1 2 7 8 13 18 5
# 6: 1 2 7 8 13 19 6
#
#
# #data.table that holds instructions to build final output
# #each row gives row numbers for a feasible combination
# head(ansframe)
# nin sin V3
# 1: 1 6300 4046
# 2: 1 6300 4047
# 3: 1 6300 4048
# 4: 1 6300 4051
# 5: 1 6300 4052
# 6: 1 6300 4055
#
# #answer frame size: each row represents a combination
# dim(ansframe)
# [1] 697152 3
#
#
# ###Code to print a selection of valid combinations from the
# ###output instructions given by the code above.
# forind = as.integer(seq(1,nrow(ansframe),length.out = 50))
# for(i in forind){print(inans[as.numeric(ansframe[i])])}
# v1 v2 v3 v4 v5 v6 index
# 1: 1 2 7 8 13 14 1
# 2: 5 6 11 12 19 20 6300
# 3: 3 4 9 10 15 16 4046
# v1 v2 v3 v4 v5 v6 index
# 1: 1 2 7 8 13 15 2
# 2: 3 4 9 11 16 17 4079
# 3: 5 6 10 12 14 20 6257
# v1 v2 v3 v4 v5 v6 index
# 1: 1 2 7 8 13 15 2
# 2: 3 4 9 12 17 20 4113
# 3: 5 6 10 11 18 19 6242
# v1 v2 v3 v4 v5 v6 index
# 1: 1 2 7 8 13 15 2
# 2: 3 4 10 12 14 19 4156
# 3: 5 6 9 11 16 18 6180
# v1 v2 v3 v4 v5 v6 index
# 1: 1 2 7 8 13 15 2
# 2: 3 4 11 12 17 18 4195
# 3: 5 6 9 10 16 20 6154
# v1 v2 v3 v4 v5 v6 index
# 1: 1 2 7 8 13 15 2
# 2: 3 5 9 11 14 17 4490
# 3: 4 6 10 12 16 18 5844
# v1 v2 v3 v4 v5 v6 index
# 1: 1 2 7 8 13 15 2
# 2: 3 5 9 12 16 19 4529
# 3: 4 6 10 11 17 18 5819
# v1 v2 v3 v4 v5 v6 index
# 1: 1 2 7 8 13 15 2
# 2: 3 5 10 11 18 20 4563
# 3: 4 6 9 12 17 19 5792
7/17/2022 Update: All combinations with variable output group size
The following approach was written for the cases of 2,3,or 4 groups. In tested cases, the code ran in the rough ballpark of 15 minutes or so for two groups but can take longer for 4 groups, depending on the size and number of the input groups.
This example shows 4 input groups with 4 output groups and 1 object from each group, but the user can also choose 2 or 3 output groups. The object "numobs" in the gc() function specifies objects to take from each group and the "groupsout" object specifies the number of output groups. The code below found 13,824 combinations in under ten minutes for this example.
(original_groups4 <- list(group_1 = as.character(1:4), group_2 = as.character(5:8), group_3 = as.character(9:12), group_4 = as.character(13:16)))
$group_1
[1] "1" "2" "3" "4"
$group_2
[1] "5" "6" "7" "8"
$group_3
[1] "9" "10" "11" "12"
$group_4
[1] "13" "14" "15" "16"
ans1 = gc(input_ob = original_groups4, numobs = 1)
###Find all combinations
###groupsout specifies the number of output groups
groupsout = 4 #Number of groups in the output: choose 2,3, or 4
inans = ans1 #dev object from gc() function with all combos
inans = data.table(inans) #order data.table by first few columns
colvec = colnames(inans)
inans$index = 1:nrow(inans)
if(groupsout == 2){ #groupsout = 2 case
for(i in 1:nrow(inans)){ #Remove rows that don't match first element
ntemp = as.character(as.vector(inans[i,..colvec]))
nin = inans$index[i]
sf = inans
for(k in colvec){
sf = sf[!get(k) %in% ntemp]
}
if(dim(sf)[1] > 0){
newframe = data.table(nin, sf$index) #Store row numbers for each combination
if(i == 1){
ansframe = newframe
}else{
ansframe = rbind(ansframe, newframe)
}}}}
if(groupsout == 3){ #groupsout = 3 case
for(i in 1:nrow(inans)){ #Remove rows that don't match first element
sf = NULL;sf2 = NULL;ntemp = NULL; nin = NULL
ntemp = as.character(as.vector(inans[i,..colvec]))
nin = inans$index[i]
sf = inans
for(k in colvec){
sf = sf[!get(k) %in% ntemp]
}
ntemp2 = NULL; sin = NULL #Create last group and write to output frame
for(j in 1:nrow(sf)){ #Remove rows that don't match second element
ntemp2 = as.character(as.vector(sf[j,..colvec]))
sin = sf$index[j]
sf2 = sf
for(m in colvec){
sf2 = sf2[!get(m) %in% ntemp2]
}
if(dim(sf2)[1] > 0){
newframe = data.table(nin, sin, sf2$index) #Store row numbers for each combination
if(i == 1){
ansframe = newframe
}else{
ansframe = rbind(ansframe, newframe)
}}}}}
###groupsout = 4 case
if(groupsout == 4){
for(i in 1:nrow(inans)){ #Remove rows that don't match first element
sf = NULL;sf2 = NULL;sf3 = NULL
ntemp = NULL; nin = NULL;fin = NULL
ntemp = as.character(as.vector(inans[i,..colvec]))
nin = inans$index[i]
sf = inans
for(k in colvec){
sf = sf[!get(k) %in% ntemp]
}
####Create next group
ntemp2 = NULL; sin = NULL
for(j in 1:nrow(sf)){ #Remove rows that don't match second element
ntemp2 = as.character(as.vector(sf[j,..colvec]))
sin = sf$index[j]
sf2 = sf
for(m in colvec){
sf2 = sf2[!get(m) %in% ntemp2]
}
####Create last group and write to output frame
fin = NULL;ntemp3 = NULL
for(p in 1:nrow(sf2)){ #Remove rows that don't match third element
ntemp3 = as.character(as.vector(sf2[p,..colvec]))
fin = sf2$index[p]
sf3 = sf2
for(r in colvec){
sf3 = sf3[!get(r) %in% ntemp3]
}
if(dim(sf3)[1] > 0){
newframe = data.table(nin, sin,fin, sf3$index) #Store row numbers
if(i == 1){
ansframe = newframe
}else{
ansframe = rbind(ansframe, newframe)
}}}}}}#end if groupsout = 4
##Remove duplicate groups
ansframe = ansframe[!duplicated(apply(ansframe,1,function(x) paste(sort(x),collapse=''))),]
#for illustration
dim(ansframe)
[1] 13824 4
##View subset of combination output to check for valid results
forind = as.integer(seq(1,nrow(ansframe),length.out = 50))
for(i in forind){print(inans[as.numeric(ansframe[i])])}
V1 V2 V3 V4 index
1: 1 5 9 13 1
2: 4 8 12 16 256
3: 3 7 11 15 171
4: 2 6 10 14 86
V1 V2 V3 V4 index
1: 1 5 9 15 3
2: 2 6 12 16 96
3: 3 7 10 14 166
4: 4 8 11 13 249
V1 V2 V3 V4 index
1: 1 5 9 16 4
2: 2 7 12 14 110
3: 3 6 11 15 155
4: 4 8 10 13 245
V1 V2 V3 V4 index
1: 1 5 10 13 5
2: 2 8 12 14 126
3: 3 7 9 16 164
4: 4 6 11 15 219
V1 V2 V3 V4 index
1: 1 5 10 15 7
2: 2 6 11 16 92
3: 3 8 12 14 190
4: 4 7 9 13 225
V1 V2 V3 V4 index
1: 1 5 10 16 8
2: 2 7 11 15 107
3: 3 6 9 14 146
4: 4 8 12 13 253
V1 V2 V3 V4 index
1: 1 5 11 13 9
2: 2 8 10 15 119
3: 3 6 12 16 160
4: 4 7 9 14 226
If I understand your question correctly, it seems to be a problem regarding all possible partitions of each group by a given size and re-organizing the partitions across all groups to form new collections, and keep only one of the isomorphics. In this case, I guess the key step is to generate all exclusive partitions by size, which seems related to permutation problem.
Since a base R option is preferrable to OP, probably we can try the code below:
build a helper function permM, which generate all permutations of vector x with given group size M
create a function f to produce desired output, i.e., all possible combinations of new groups, where all combinations are stored in a nested list
# generate all permuations of x with given size M for each group
permM <- function(x, M) {
if (length(x) == M) {
return(list(x))
}
S <- combn(x, M, simplify = FALSE)
res <- c()
for (k in seq_along(S)) {
z <- Recall(x[!x %in% S[[k]]], M)
res <- c(res, lapply(z, c, S[[k]]))
}
res
}
# create all possible combinations of new groups
f <- function(lst, K) {
nms <- names(lst)
l <- lapply(lst, combn, m = length(lst) * K)
g <- apply(
expand.grid(lapply(choose(lengths(lst), length(lst) * K), seq)),
1,
function(idx) {
Map(function(p, q) l[[p]][, q], seq_along(idx), unlist(idx))
}
)
x <- do.call(
c,
lapply(
g,
function(v) {
apply(
expand.grid(lapply(v, permM, M = K)),
1,
function(...) {
setNames(
asplit(do.call(rbind, lapply(..., matrix, K)), 2),
nms
)
}
)
}
)
)
# remove the isomorphics but keep one of them only
x[
!duplicated(lapply(
x,
function(v) {
unname(sort(sapply(v, function(z) toString(sort(z)))))
}
))
]
}
Example
Given a smaller data sample lst <- list(grp1 = 1:4, grp2 = 5:9) as the original_group list, we run
r1 <- f(lst,1)
r2 <- f(lst,2)
and we will see a snapshot of result like below
> head(r1)
[[1]]
[[1]]$grp1
[1] 2 6
[[1]]$grp2
[1] 1 5
[[2]]
[[2]]$grp1
[1] 1 6
[[2]]$grp2
[1] 2 5
[[3]]
[[3]]$grp1
[1] 3 6
[[3]]$grp2
[1] 1 5
[[4]]
[[4]]$grp1
[1] 1 6
[[4]]$grp2
[1] 3 5
[[5]]
[[5]]$grp1
[1] 4 6
[[5]]$grp2
[1] 1 5
[[6]]
[[6]]$grp1
[1] 1 6
[[6]]$grp2
[1] 4 5
and
> head(r2)
[[1]]
[[1]]$grp1
[1] 3 4 7 8
[[1]]$grp2
[1] 1 2 5 6
[[2]]
[[2]]$grp1
[1] 2 4 7 8
[[2]]$grp2
[1] 1 3 5 6
[[3]]
[[3]]$grp1
[1] 2 3 7 8
[[3]]$grp2
[1] 1 4 5 6
[[4]]
[[4]]$grp1
[1] 1 4 7 8
[[4]]$grp2
[1] 2 3 5 6
[[5]]
[[5]]$grp1
[1] 1 3 7 8
[[5]]$grp2
[1] 2 4 5 6
[[6]]
[[6]]$grp1
[1] 1 2 7 8
[[6]]$grp2
[1] 3 4 5 6