How to create a hypercube vertices [duplicate] - r

I am trying to generate all possible combinations of 0 and 1's in a vector of length 14. Is there an easy way of getting that output as a list of vectors, or even better, a dataframe?
To demonstrate better what I am looking for, let's suppose that I only want a vector of length 3. I would like to be able to generate the following:
(1,1,1), (0,0,0), (1,1,0), (1,0,0), (1,0,1), (0,1,0), (0,1,1), (0,0,0)

You're looking for expand.grid.
expand.grid(0:1, 0:1, 0:1)
Or, for the long case:
n <- 14
l <- rep(list(0:1), n)
expand.grid(l)

tidyr has a couple of options similar to expand.grid().
tidyr::crossing() returns a tibble and does not convert strings to factors (though you could do expand.grid(..., stringsAsFactors = F)).
library(tidyr)
crossing(var1 = 0:1, var2 = 0:1, var3 = 0:1)
# A tibble: 8 x 3
var1 var2 var3
<int> <int> <int>
1 0 0 0
2 0 0 1
3 0 1 0
4 0 1 1
5 1 0 0
6 1 0 1
7 1 1 0
8 1 1 1
tidyr::expand() can give both combinations of only values that appear in the data, like this:
expand(mtcars, nesting(vs, cyl))
# A tibble: 5 x 2
vs cyl
<dbl> <dbl>
1 0 4
2 0 6
3 0 8
4 1 4
5 1 6
or all possible combinations of two variables, even if there isn't an observation with those specific values in the data in the data, like this:
expand(mtcars, vs, cyl)
# A tibble: 6 x 2
vs cyl
<dbl> <dbl>
1 0 4
2 0 6
3 0 8
4 1 4
5 1 6
6 1 8
(You can see that there were no observations in the original data where vs == 1 & cyl == 8)
tidyr::complete() can also be used similar to expand.grid(). This is an example from the docs:
df <- dplyr::tibble(
group = c(1:2, 1),
item_id = c(1:2, 2),
item_name = c("a", "b", "b"),
value1 = 1:3,
value2 = 4:6
)
df %>% complete(group, nesting(item_id, item_name))
# A tibble: 4 x 5
group item_id item_name value1 value2
<dbl> <dbl> <chr> <int> <int>
1 1 1 a 1 4
2 1 2 b 3 6
3 2 1 a NA NA
4 2 2 b 2 5
This gives all possible combinations of item_id and item_name for each group - it creates a line for group=2 item_id=1 and item_name=a.

As an alternative to #Justin's approach, you can also use CJ from the "data.table" package. Here, I've also made use of replicate to create my list of 14 zeroes and ones.
library(data.table)
do.call(CJ, replicate(14, 0:1, FALSE))
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14
# 1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# 2: 0 0 0 0 0 0 0 0 0 0 0 0 0 1
# 3: 0 0 0 0 0 0 0 0 0 0 0 0 1 0
# 4: 0 0 0 0 0 0 0 0 0 0 0 0 1 1
# 5: 0 0 0 0 0 0 0 0 0 0 0 1 0 0
# ---
# 16380: 1 1 1 1 1 1 1 1 1 1 1 0 1 1
# 16381: 1 1 1 1 1 1 1 1 1 1 1 1 0 0
# 16382: 1 1 1 1 1 1 1 1 1 1 1 1 0 1
# 16383: 1 1 1 1 1 1 1 1 1 1 1 1 1 0
# 16384: 1 1 1 1 1 1 1 1 1 1 1 1 1 1

I discuss here a generic approach to solve all similar type of questions like this one. First let's see how the solutions evolve with increasing number of N to find out the general patterns.
First, the solution for length 1 is
0
1
Now for length 2, the solution becomes (2nd column separated by |):
0 | 0 0, 0 1
1 | 1 0, 1 1
Comparing it with previous solution for length 1, it is obvious that to obtain this new solution we simply append 0 and 1 to each of the previous solution (1st column, 0 and 1).
Now for length 3, the solution is (3rd column):
0 | 0 0 | 0 0 0, 0 0 1
1 | 1 0 | 1 0 0, 1 0 1
| 0 1 | 0 1 0, 0 1 1
| 1 1 | 1 1 0, 1 1 1
Again, this new solution is obtained by appending 0 and 1 to each of the previous solution (2nd column for length 2).
This observation naturally leads to a recursive solution. Assume we have already obtained our solution for length N-1 solution(c(0,1), N-1), to obtain solution of N we simply append 0 and 1 to each item of the solution N-1 append_each_to_list(solution(c(0,1), N-1), c(0,1)). Notice here how a more complex problem (solving N) is naturally decomposed to a simpler problem (solving N-1).
Then we just need to translate this plain English to R code almost literally:
# assume you have got solution for a shorter length len-1 -> solution(v, len-1)
# the solution of length len will be the solution of shorter length appended with each element in v
solution <- function(v, len) {
if (len<=1) {
as.list(v)
} else {
append_each_to_list(solution(v, len-1), v)
}
}
# function to append each element in vector v to list L and return a list
append_each_to_list <- function(L, v) {
purrr::flatten(lapply(v,
function(n) lapply(L, function(l) c(l, n))
))
}
To call the function:
> solution(c(1,0), 3)
[[1]]
[1] 1 1 1
[[2]]
[1] 0 1 1
[[3]]
[1] 1 0 1
[[4]]
[1] 0 0 1
[[5]]
[1] 1 1 0
[[6]]
[1] 0 1 0
[[7]]
[1] 1 0 0

There are 16384 possible permutations. You can use the iterpc package to fetch the result iteratively.
library(iterpc)
I = iterpc(2, 14, label=c(0,1), order=T, replace=T)
getnext(I)
# [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0
getnext(I)
# [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 1
getnext(I)
# [1] 0 0 0 0 0 0 0 0 0 0 0 0 1 0
If you want all results, you can still use getall(I).

Since you are dealing with 0's and 1's, it seems natural to think of integers in terms of bit. Using a function that has been slightly altered from this post (MyIntToBit below), along with your choice of apply functions, we can get the desired result.
MyIntToBit <- function(x, dig) {
i <- 0L
string <- numeric(dig)
while (x > 0) {
string[dig - i] <- x %% 2L
x <- x %/% 2L
i <- i + 1L
}
string
}
If you want a list, use lapply like so:
lapply(0:(2^14 - 1), function(x) MyIntToBit(x,14))
If you prefer a matrix, sapply will do the trick:
sapply(0:(2^14 - 1), function(x) MyIntToBit(x,14))
Below are example outputs:
> lapply(0:(2^3 - 1), function(x) MyIntToBit(x,3))
[[1]]
[1] 0 0 0
[[2]]
[1] 0 0 1
[[3]]
[1] 0 1 0
[[4]]
[1] 0 1 1
[[5]]
[1] 1 0 0
[[6]]
[1] 1 0 1
[[7]]
[1] 1 1 0
[[8]]
[1] 1 1 1
> sapply(0:(2^3 - 1), function(x) MyIntToBit(x,3))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 0 0 0 0 1 1 1 1
[2,] 0 0 1 1 0 0 1 1
[3,] 0 1 0 1 0 1 0 1

This is a different approach to the previous answers. If you need all possible combinations of 14 values of 1 and 0, it's like generating all possible numbers from 0 to (2^14)-1 and keeping the binary representation of them.
n <- 14
lapply(0:(2^n-1), FUN=function(x) head(as.integer(intToBits(x)),n))

Preface
Many nice answers here. I want to add one for those of us that can't seem to wrap their heads around the provided implementations. The solutions here are essentially generalizations of loops, which is why recursive solutions look so elegant. No one outright wrote it as a loop--I think there are merits to giving the most straight-forward solution, just to trace out what's actually happening.
This is not guaranteed to have great performance--and most of the other answers are more practical. The purpose is to allow you to trace out what's actually happening.
The Math
A combination is all the unique selections of a set in which the order of the elements do not matter ([0, 1] is different from [1, 0]). Your list has n elements and you are selecting k elements, for a total number of combinations n^k.
Ex.
You have three letters, ['a', 'b', 'c'] and want to find all unique ways to arrange two of these letters, allowing letters to be pulled repeatedly (so ['a', 'a'] is allowed). n = 3 and k = 2--we have three things and want to find all of the different ways to pick two of them. There are 9 ways to make this selection (3^2--->n^k).
The Code
As mentioned, the simplest solution requires a whole lotta loops.
Keep adding loops and values to select from as your value of k increases.
set <- c("a", "b", "c")
n <- length(set)
# k = 1
# There are only three ways to pick one thing from a selection of three items!
sprintf("Number of combinations:%4d", n^1)
for(i in seq_along(set)){
print(paste(set[i]))
}
# k = 2
sprintf("Number of combinations:%4d", n^2)
for(i in seq_along(set)){
for(j in seq_along(set)){
print(paste(set[i], set[j]))
}
}
# k = 3
sprintf("Number of combinations:%4d", n^3)
for(i in seq_along(set)){
for(j in seq_along(set)){
for(k in seq_along(set)){
print(paste(set[i], set[j], set[k]))
}
}
}
# See the pattern? The value of k corresponds
# to the number of loops and to the number of
# indexes on `set`

A purrr solution with cross() and its variant:
library(purrr)
cross(list(0:1, 0:1, 0:1)) %>% simplify_all()
# [[1]]
# [1] 0 0 0
#
# [[2]]
# [1] 1 0 0
#
# [[3]]
# [1] 0 1 0
#
# ...
#
# [[8]]
# [1] 1 1 1
cross_df(list(var1 = 0:1, var2 = 0:1, var3 = 0:1))
# # A tibble: 8 × 3
# var1 var2 var3
# <int> <int> <int>
# 1 0 0 0
# 2 1 0 0
# 3 0 1 0
# 4 1 1 0
# 5 0 0 1
# 6 1 0 1
# 7 0 1 1
# 8 1 1 1
With dplyr, you could use full_join(x, y, by = character()) to perform a cross-join, generating all combinations of x and y.
Reduce(\(x, y) full_join(x, y, by = character()),
list(tibble(var1 = 0:1), tibble(var2 = 0:1), tibble(var3 = 0:1)))
# # A tibble: 8 × 3
# var1 var2 var3
# <int> <int> <int>
# 1 0 0 0
# 2 0 0 1
# 3 0 1 0
# 4 0 1 1
# 5 1 0 0
# 6 1 0 1
# 7 1 1 0
# 8 1 1 1

A beautiful minimal reproducible example here:
x <- c("red", "blue", "black")
do.call(c, lapply(seq_along(x), combn, x = x, simplify = FALSE))
# [[1]]
# [1] "red"
#
# [[2]]
# [1] "blue"
#
# [[3]]
# [1] "black"
#
# [[4]]
# [1] "red" "blue"
#
# [[5]]
# [1] "red" "black"
#
# [[6]]
# [1] "blue" "black"
#
# [[7]]
# [1] "red" "blue" "black"
All credit goes to #RichScriven

Related

how to transform data frame to simultaneous equations in R

I have this matrix.
mat<-c("A","NODATA","NODATA","NODATA","A","B","C","NODATA","A","B","C","NODATA","D","E","A","NODATA","D","B","A","NODATA")
mat2 <- matrix(mat<-c("A","NODATA","NODATA","NODATA","A","B","C","NODATA","A","B","C","NODATA","D","E","A","NODATA","D","B","A","NODATA"),nrow = 4,ncol = 5)
mat3<-t(mat2)
colnames(mat3)<-c("col1","col2","col3","col4")
mat3
col1 col2 col3 col4
[1,] "A" "NODATA" "NODATA" "NODATA"
[2,] "A" "B" "C" "NODATA"
[3,] "A" "B" "C" "NODATA"
[4,] "D" "E" "A" "NODATA"
[5,] "D" "B" "A" "NODATA"
I want to change dataframe as below in R.
A B C D E NODATA
1 0 0 0 0 1
1 1 1 0 0 1
1 1 1 0 0 1
1 0 0 1 1 1
1 1 0 1 1 1
do you know any idea ?
thank you.
library(dplyr)
data.frame(rows=seq_len(nrow(mat3))[row(mat3)], values=c(mat3)) %>%
mutate(a=1) %>%
pivot_wider(id_cols="rows", names_from="values", values_from="a", values_fn=list(a=length)) %>%
mutate_all(~ +!is.na(.)) %>%
select(-rows) %>%
select(sort(colnames(.)))
# # A tibble: 5 x 6
# A B C D E NODATA
# <int> <int> <int> <int> <int> <int>
# 1 1 0 0 0 0 1
# 2 1 1 1 0 0 1
# 3 1 1 1 0 0 1
# 4 1 0 0 1 1 1
# 5 1 1 0 1 0 1
The first line (data.frame(...)) suggested by https://stackoverflow.com/a/26838774/3358272.
Here is a base R approach to this. We first create an empty matrix of zeroes with dimensions determined by number of columns of unique characters in original matrix. Then, we convert the matrix to pairs of "coordinates" (row, column pairs) that indicate where 1 should be placed and substitute.
mat3_pairs <- cbind(c(row(mat3)), c(mat3))
new_mat <- matrix(rep(0, length(unique(mat3_pairs[,2])) * nrow(mat3)), nrow = nrow(mat3))
colnames(new_mat) <- sort(unique(df$col))
rownames(new_mat) <- as.character(1:nrow(mat3))
new_mat[mat3_pairs] <- 1
new_mat
Output
A B C D E NODATA
1 1 0 0 0 0 1
2 1 1 1 0 0 1
3 1 1 1 0 0 1
4 1 0 0 1 1 1
5 1 1 0 1 0 1

How to remove duplicate values from different rows per unique identifier?

I'm just starting to use R. I have a dataset with in the first column unique identifiers (1958 patients) and in columns 2-35 0's en 1's.
For example:
Patient A: 0 1 0 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 NA NA
I want to change this to:
Patient A: 0 1 0 1 0 1
Thanks in advance.
We can use tapply and grouping our variable based on whether it changes value or not, i.e.
tapply(x[!is.na(x)], cumsum(c(TRUE, diff(x[!is.na(x)]) != 0)), FUN = unique)
#1 2 3 4 5 6
#0 1 0 1 0 1
Based on your example, it is not clear whether NA's can also occur in the middle, and how you would want to deal with that situation (e.g. make 1 NA 1 to 1 1 (option 1) and hence combine the two 1's, or whether NA would mark a boundary and you would keep both 1's (option 2).
That determines at which point to remove NA's in the code.
You could use S4Vectors run length encoding, which would allow you to have more than just 0 and 1.
library(S4Vectors)
## create example data
set.seed(1)
x <- sample(c(0,1), (1958*34), replace=TRUE, prob=c(.4, .6))
x[sample(length(x), 200)] <- NA
x <- matrix(x, nrow=1958, ncol=34)
df <- data.frame(patient.id = paste0("P", seq_len(1958)), x, stringsAsFactors = FALSE)
## define function to remove NA values
# option 1
fun.NA.boundary <- function(x) {
a <- runValue(Rle(x))
a[!is.na(a)]
}
# option 2
fun.NA.remove <- function(x) runValue(Rle(x[!is.na(x)]))
## calculate results
# option 1
reslist <- apply(x[,-1], 1, function(y) fun.NA.boundary(y))
# option 2
reslist <- apply(x[,-1], 1, function(y) fun.NA.remove(y))
names(reslist) <- df$patient.id
head(reslist)
#> $P1
#> [1] 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
#>
#> $P2
#> [1] 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
#>
#> $P3
#> [1] 0 1 0 1 0 1 0 1 0 1 0 1 0 1
#>
#> $P4
#> [1] 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
#>
#> $P5
#> [1] 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
#>
#> $P6
#> [1] 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0

Generate table using all unique combinations from three columns with data.table in R [duplicate]

I am trying to generate all possible combinations of 0 and 1's in a vector of length 14. Is there an easy way of getting that output as a list of vectors, or even better, a dataframe?
To demonstrate better what I am looking for, let's suppose that I only want a vector of length 3. I would like to be able to generate the following:
(1,1,1), (0,0,0), (1,1,0), (1,0,0), (1,0,1), (0,1,0), (0,1,1), (0,0,0)
You're looking for expand.grid.
expand.grid(0:1, 0:1, 0:1)
Or, for the long case:
n <- 14
l <- rep(list(0:1), n)
expand.grid(l)
tidyr has a couple of options similar to expand.grid().
tidyr::crossing() returns a tibble and does not convert strings to factors (though you could do expand.grid(..., stringsAsFactors = F)).
library(tidyr)
crossing(var1 = 0:1, var2 = 0:1, var3 = 0:1)
# A tibble: 8 x 3
var1 var2 var3
<int> <int> <int>
1 0 0 0
2 0 0 1
3 0 1 0
4 0 1 1
5 1 0 0
6 1 0 1
7 1 1 0
8 1 1 1
tidyr::expand() can give both combinations of only values that appear in the data, like this:
expand(mtcars, nesting(vs, cyl))
# A tibble: 5 x 2
vs cyl
<dbl> <dbl>
1 0 4
2 0 6
3 0 8
4 1 4
5 1 6
or all possible combinations of two variables, even if there isn't an observation with those specific values in the data in the data, like this:
expand(mtcars, vs, cyl)
# A tibble: 6 x 2
vs cyl
<dbl> <dbl>
1 0 4
2 0 6
3 0 8
4 1 4
5 1 6
6 1 8
(You can see that there were no observations in the original data where vs == 1 & cyl == 8)
tidyr::complete() can also be used similar to expand.grid(). This is an example from the docs:
df <- dplyr::tibble(
group = c(1:2, 1),
item_id = c(1:2, 2),
item_name = c("a", "b", "b"),
value1 = 1:3,
value2 = 4:6
)
df %>% complete(group, nesting(item_id, item_name))
# A tibble: 4 x 5
group item_id item_name value1 value2
<dbl> <dbl> <chr> <int> <int>
1 1 1 a 1 4
2 1 2 b 3 6
3 2 1 a NA NA
4 2 2 b 2 5
This gives all possible combinations of item_id and item_name for each group - it creates a line for group=2 item_id=1 and item_name=a.
As an alternative to #Justin's approach, you can also use CJ from the "data.table" package. Here, I've also made use of replicate to create my list of 14 zeroes and ones.
library(data.table)
do.call(CJ, replicate(14, 0:1, FALSE))
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14
# 1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# 2: 0 0 0 0 0 0 0 0 0 0 0 0 0 1
# 3: 0 0 0 0 0 0 0 0 0 0 0 0 1 0
# 4: 0 0 0 0 0 0 0 0 0 0 0 0 1 1
# 5: 0 0 0 0 0 0 0 0 0 0 0 1 0 0
# ---
# 16380: 1 1 1 1 1 1 1 1 1 1 1 0 1 1
# 16381: 1 1 1 1 1 1 1 1 1 1 1 1 0 0
# 16382: 1 1 1 1 1 1 1 1 1 1 1 1 0 1
# 16383: 1 1 1 1 1 1 1 1 1 1 1 1 1 0
# 16384: 1 1 1 1 1 1 1 1 1 1 1 1 1 1
I discuss here a generic approach to solve all similar type of questions like this one. First let's see how the solutions evolve with increasing number of N to find out the general patterns.
First, the solution for length 1 is
0
1
Now for length 2, the solution becomes (2nd column separated by |):
0 | 0 0, 0 1
1 | 1 0, 1 1
Comparing it with previous solution for length 1, it is obvious that to obtain this new solution we simply append 0 and 1 to each of the previous solution (1st column, 0 and 1).
Now for length 3, the solution is (3rd column):
0 | 0 0 | 0 0 0, 0 0 1
1 | 1 0 | 1 0 0, 1 0 1
| 0 1 | 0 1 0, 0 1 1
| 1 1 | 1 1 0, 1 1 1
Again, this new solution is obtained by appending 0 and 1 to each of the previous solution (2nd column for length 2).
This observation naturally leads to a recursive solution. Assume we have already obtained our solution for length N-1 solution(c(0,1), N-1), to obtain solution of N we simply append 0 and 1 to each item of the solution N-1 append_each_to_list(solution(c(0,1), N-1), c(0,1)). Notice here how a more complex problem (solving N) is naturally decomposed to a simpler problem (solving N-1).
Then we just need to translate this plain English to R code almost literally:
# assume you have got solution for a shorter length len-1 -> solution(v, len-1)
# the solution of length len will be the solution of shorter length appended with each element in v
solution <- function(v, len) {
if (len<=1) {
as.list(v)
} else {
append_each_to_list(solution(v, len-1), v)
}
}
# function to append each element in vector v to list L and return a list
append_each_to_list <- function(L, v) {
purrr::flatten(lapply(v,
function(n) lapply(L, function(l) c(l, n))
))
}
To call the function:
> solution(c(1,0), 3)
[[1]]
[1] 1 1 1
[[2]]
[1] 0 1 1
[[3]]
[1] 1 0 1
[[4]]
[1] 0 0 1
[[5]]
[1] 1 1 0
[[6]]
[1] 0 1 0
[[7]]
[1] 1 0 0
There are 16384 possible permutations. You can use the iterpc package to fetch the result iteratively.
library(iterpc)
I = iterpc(2, 14, label=c(0,1), order=T, replace=T)
getnext(I)
# [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0
getnext(I)
# [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 1
getnext(I)
# [1] 0 0 0 0 0 0 0 0 0 0 0 0 1 0
If you want all results, you can still use getall(I).
Since you are dealing with 0's and 1's, it seems natural to think of integers in terms of bit. Using a function that has been slightly altered from this post (MyIntToBit below), along with your choice of apply functions, we can get the desired result.
MyIntToBit <- function(x, dig) {
i <- 0L
string <- numeric(dig)
while (x > 0) {
string[dig - i] <- x %% 2L
x <- x %/% 2L
i <- i + 1L
}
string
}
If you want a list, use lapply like so:
lapply(0:(2^14 - 1), function(x) MyIntToBit(x,14))
If you prefer a matrix, sapply will do the trick:
sapply(0:(2^14 - 1), function(x) MyIntToBit(x,14))
Below are example outputs:
> lapply(0:(2^3 - 1), function(x) MyIntToBit(x,3))
[[1]]
[1] 0 0 0
[[2]]
[1] 0 0 1
[[3]]
[1] 0 1 0
[[4]]
[1] 0 1 1
[[5]]
[1] 1 0 0
[[6]]
[1] 1 0 1
[[7]]
[1] 1 1 0
[[8]]
[1] 1 1 1
> sapply(0:(2^3 - 1), function(x) MyIntToBit(x,3))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 0 0 0 0 1 1 1 1
[2,] 0 0 1 1 0 0 1 1
[3,] 0 1 0 1 0 1 0 1
This is a different approach to the previous answers. If you need all possible combinations of 14 values of 1 and 0, it's like generating all possible numbers from 0 to (2^14)-1 and keeping the binary representation of them.
n <- 14
lapply(0:(2^n-1), FUN=function(x) head(as.integer(intToBits(x)),n))
Preface
Many nice answers here. I want to add one for those of us that can't seem to wrap their heads around the provided implementations. The solutions here are essentially generalizations of loops, which is why recursive solutions look so elegant. No one outright wrote it as a loop--I think there are merits to giving the most straight-forward solution, just to trace out what's actually happening.
This is not guaranteed to have great performance--and most of the other answers are more practical. The purpose is to allow you to trace out what's actually happening.
The Math
A combination is all the unique selections of a set in which the order of the elements do not matter ([0, 1] is different from [1, 0]). Your list has n elements and you are selecting k elements, for a total number of combinations n^k.
Ex.
You have three letters, ['a', 'b', 'c'] and want to find all unique ways to arrange two of these letters, allowing letters to be pulled repeatedly (so ['a', 'a'] is allowed). n = 3 and k = 2--we have three things and want to find all of the different ways to pick two of them. There are 9 ways to make this selection (3^2--->n^k).
The Code
As mentioned, the simplest solution requires a whole lotta loops.
Keep adding loops and values to select from as your value of k increases.
set <- c("a", "b", "c")
n <- length(set)
# k = 1
# There are only three ways to pick one thing from a selection of three items!
sprintf("Number of combinations:%4d", n^1)
for(i in seq_along(set)){
print(paste(set[i]))
}
# k = 2
sprintf("Number of combinations:%4d", n^2)
for(i in seq_along(set)){
for(j in seq_along(set)){
print(paste(set[i], set[j]))
}
}
# k = 3
sprintf("Number of combinations:%4d", n^3)
for(i in seq_along(set)){
for(j in seq_along(set)){
for(k in seq_along(set)){
print(paste(set[i], set[j], set[k]))
}
}
}
# See the pattern? The value of k corresponds
# to the number of loops and to the number of
# indexes on `set`
A purrr solution with cross() and its variant:
library(purrr)
cross(list(0:1, 0:1, 0:1)) %>% simplify_all()
# [[1]]
# [1] 0 0 0
#
# [[2]]
# [1] 1 0 0
#
# [[3]]
# [1] 0 1 0
#
# ...
#
# [[8]]
# [1] 1 1 1
cross_df(list(var1 = 0:1, var2 = 0:1, var3 = 0:1))
# # A tibble: 8 × 3
# var1 var2 var3
# <int> <int> <int>
# 1 0 0 0
# 2 1 0 0
# 3 0 1 0
# 4 1 1 0
# 5 0 0 1
# 6 1 0 1
# 7 0 1 1
# 8 1 1 1
With dplyr, you could use full_join(x, y, by = character()) to perform a cross-join, generating all combinations of x and y.
Reduce(\(x, y) full_join(x, y, by = character()),
list(tibble(var1 = 0:1), tibble(var2 = 0:1), tibble(var3 = 0:1)))
# # A tibble: 8 × 3
# var1 var2 var3
# <int> <int> <int>
# 1 0 0 0
# 2 0 0 1
# 3 0 1 0
# 4 0 1 1
# 5 1 0 0
# 6 1 0 1
# 7 1 1 0
# 8 1 1 1
A beautiful minimal reproducible example here:
x <- c("red", "blue", "black")
do.call(c, lapply(seq_along(x), combn, x = x, simplify = FALSE))
# [[1]]
# [1] "red"
#
# [[2]]
# [1] "blue"
#
# [[3]]
# [1] "black"
#
# [[4]]
# [1] "red" "blue"
#
# [[5]]
# [1] "red" "black"
#
# [[6]]
# [1] "blue" "black"
#
# [[7]]
# [1] "red" "blue" "black"
All credit goes to #RichScriven

R - Convert list of list into data frame

I have the next issue trying to convert this list of list into a data frame where is unique element of the list is a column of its own.
This is what I have right now:
> head(data$egg_groups)
[[1]]
name resource_uri
1 Plant /api/v1/egg/7/
2 Monster /api/v1/egg/1/
[[2]]
name resource_uri
1 Plant /api/v1/egg/7/
2 Monster /api/v1/egg/1/
[[3]]
name resource_uri
1 Plant /api/v1/egg/7/
2 Monster /api/v1/egg/1/
[[4]]
name resource_uri
1 Dragon /api/v1/egg/14/
2 Monster /api/v1/egg/1/
[[5]]
name resource_uri
1 Dragon /api/v1/egg/14/
2 Monster /api/v1/egg/1/
[[6]]
name resource_uri
1 Dragon /api/v1/egg/14/
2 Monster /api/v1/egg/1/
What I would like to have is a data frame where is one of those entries (just name) is a column of its own.
Something like this:
Plant Monster Dragon
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
6 1 1
I have tried the library plyr and the using unlist and so far nothing has worked. Any tips would be appreciated. Thanks
EDIT: This is the dput pastebin link:
dput
I would suggest using mtabulate from the "qdapTools" package. First, just loop through the list and extract the relevant column as a vector, and use the resulting list as the input for mtabulate, something like this:
library(qdapTools)
head(mtabulate(lapply(L, `[[`, "name")))
# Bug Ditto Dragon Fairy Flying Ground Human-like Indeterminate Mineral Monster
# 1 0 0 0 0 0 0 0 0 0 1
# 2 0 0 0 0 0 0 0 0 0 1
# 3 0 0 0 0 0 0 0 0 0 1
# 4 0 0 1 0 0 0 0 0 0 1
# 5 0 0 1 0 0 0 0 0 0 1
# 6 0 0 1 0 0 0 0 0 0 1
# Plant Undiscovered Water1 Water2 Water3
# 1 1 0 0 0 0
# 2 1 0 0 0 0
# 3 1 0 0 0 0
# 4 0 0 0 0 0
# 5 0 0 0 0 0
# 6 0 0 0 0 0
You can use rbindlist() from data.table v1.9.5 as follows:
(Using #lukeA's example)
require(data.table) # 1.9.5+
dt = rbindlist(l, idcol="id")
# id x y
# 1: 1 a 1
# 2: 1 b 2
# 3: 2 b 2
# 4: 2 c 3
dcast(dt, id ~ x, fun.aggregate = length)
# id a b c
# 1: 1 1 1 0
# 2: 2 0 1 1
You can install it by following the instructions here.
Here's one way to do it:
(l <- list(data.frame(x = letters[1:2], y = 1:2), data.frame(x = letters[2:3], y = 2:3)))
# [[1]]
# x y
# 1 a 1
# 2 b 2
#
# [[2]]
# x y
# 1 b 2
# 2 c 3
df <- do.call(rbind, lapply(1:length(l), function(x) cbind(l[[x]], id = x) ))
# x y id
# 1 a 1 1
# 2 b 2 1
# 3 b 2 2
# 4 c 3 2
library(reshape2)
dcast(df, id~x, fun.aggregate = function(x) if (length(x)) "1" else "" )[-1]
# a b c
# 1 1 1
# 2 1 1

Generate list of all possible combinations of elements of vector

I am trying to generate all possible combinations of 0 and 1's in a vector of length 14. Is there an easy way of getting that output as a list of vectors, or even better, a dataframe?
To demonstrate better what I am looking for, let's suppose that I only want a vector of length 3. I would like to be able to generate the following:
(1,1,1), (0,0,0), (1,1,0), (1,0,0), (1,0,1), (0,1,0), (0,1,1), (0,0,0)
You're looking for expand.grid.
expand.grid(0:1, 0:1, 0:1)
Or, for the long case:
n <- 14
l <- rep(list(0:1), n)
expand.grid(l)
tidyr has a couple of options similar to expand.grid().
tidyr::crossing() returns a tibble and does not convert strings to factors (though you could do expand.grid(..., stringsAsFactors = F)).
library(tidyr)
crossing(var1 = 0:1, var2 = 0:1, var3 = 0:1)
# A tibble: 8 x 3
var1 var2 var3
<int> <int> <int>
1 0 0 0
2 0 0 1
3 0 1 0
4 0 1 1
5 1 0 0
6 1 0 1
7 1 1 0
8 1 1 1
tidyr::expand() can give both combinations of only values that appear in the data, like this:
expand(mtcars, nesting(vs, cyl))
# A tibble: 5 x 2
vs cyl
<dbl> <dbl>
1 0 4
2 0 6
3 0 8
4 1 4
5 1 6
or all possible combinations of two variables, even if there isn't an observation with those specific values in the data in the data, like this:
expand(mtcars, vs, cyl)
# A tibble: 6 x 2
vs cyl
<dbl> <dbl>
1 0 4
2 0 6
3 0 8
4 1 4
5 1 6
6 1 8
(You can see that there were no observations in the original data where vs == 1 & cyl == 8)
tidyr::complete() can also be used similar to expand.grid(). This is an example from the docs:
df <- dplyr::tibble(
group = c(1:2, 1),
item_id = c(1:2, 2),
item_name = c("a", "b", "b"),
value1 = 1:3,
value2 = 4:6
)
df %>% complete(group, nesting(item_id, item_name))
# A tibble: 4 x 5
group item_id item_name value1 value2
<dbl> <dbl> <chr> <int> <int>
1 1 1 a 1 4
2 1 2 b 3 6
3 2 1 a NA NA
4 2 2 b 2 5
This gives all possible combinations of item_id and item_name for each group - it creates a line for group=2 item_id=1 and item_name=a.
As an alternative to #Justin's approach, you can also use CJ from the "data.table" package. Here, I've also made use of replicate to create my list of 14 zeroes and ones.
library(data.table)
do.call(CJ, replicate(14, 0:1, FALSE))
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14
# 1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# 2: 0 0 0 0 0 0 0 0 0 0 0 0 0 1
# 3: 0 0 0 0 0 0 0 0 0 0 0 0 1 0
# 4: 0 0 0 0 0 0 0 0 0 0 0 0 1 1
# 5: 0 0 0 0 0 0 0 0 0 0 0 1 0 0
# ---
# 16380: 1 1 1 1 1 1 1 1 1 1 1 0 1 1
# 16381: 1 1 1 1 1 1 1 1 1 1 1 1 0 0
# 16382: 1 1 1 1 1 1 1 1 1 1 1 1 0 1
# 16383: 1 1 1 1 1 1 1 1 1 1 1 1 1 0
# 16384: 1 1 1 1 1 1 1 1 1 1 1 1 1 1
I discuss here a generic approach to solve all similar type of questions like this one. First let's see how the solutions evolve with increasing number of N to find out the general patterns.
First, the solution for length 1 is
0
1
Now for length 2, the solution becomes (2nd column separated by |):
0 | 0 0, 0 1
1 | 1 0, 1 1
Comparing it with previous solution for length 1, it is obvious that to obtain this new solution we simply append 0 and 1 to each of the previous solution (1st column, 0 and 1).
Now for length 3, the solution is (3rd column):
0 | 0 0 | 0 0 0, 0 0 1
1 | 1 0 | 1 0 0, 1 0 1
| 0 1 | 0 1 0, 0 1 1
| 1 1 | 1 1 0, 1 1 1
Again, this new solution is obtained by appending 0 and 1 to each of the previous solution (2nd column for length 2).
This observation naturally leads to a recursive solution. Assume we have already obtained our solution for length N-1 solution(c(0,1), N-1), to obtain solution of N we simply append 0 and 1 to each item of the solution N-1 append_each_to_list(solution(c(0,1), N-1), c(0,1)). Notice here how a more complex problem (solving N) is naturally decomposed to a simpler problem (solving N-1).
Then we just need to translate this plain English to R code almost literally:
# assume you have got solution for a shorter length len-1 -> solution(v, len-1)
# the solution of length len will be the solution of shorter length appended with each element in v
solution <- function(v, len) {
if (len<=1) {
as.list(v)
} else {
append_each_to_list(solution(v, len-1), v)
}
}
# function to append each element in vector v to list L and return a list
append_each_to_list <- function(L, v) {
purrr::flatten(lapply(v,
function(n) lapply(L, function(l) c(l, n))
))
}
To call the function:
> solution(c(1,0), 3)
[[1]]
[1] 1 1 1
[[2]]
[1] 0 1 1
[[3]]
[1] 1 0 1
[[4]]
[1] 0 0 1
[[5]]
[1] 1 1 0
[[6]]
[1] 0 1 0
[[7]]
[1] 1 0 0
There are 16384 possible permutations. You can use the iterpc package to fetch the result iteratively.
library(iterpc)
I = iterpc(2, 14, label=c(0,1), order=T, replace=T)
getnext(I)
# [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0
getnext(I)
# [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 1
getnext(I)
# [1] 0 0 0 0 0 0 0 0 0 0 0 0 1 0
If you want all results, you can still use getall(I).
Since you are dealing with 0's and 1's, it seems natural to think of integers in terms of bit. Using a function that has been slightly altered from this post (MyIntToBit below), along with your choice of apply functions, we can get the desired result.
MyIntToBit <- function(x, dig) {
i <- 0L
string <- numeric(dig)
while (x > 0) {
string[dig - i] <- x %% 2L
x <- x %/% 2L
i <- i + 1L
}
string
}
If you want a list, use lapply like so:
lapply(0:(2^14 - 1), function(x) MyIntToBit(x,14))
If you prefer a matrix, sapply will do the trick:
sapply(0:(2^14 - 1), function(x) MyIntToBit(x,14))
Below are example outputs:
> lapply(0:(2^3 - 1), function(x) MyIntToBit(x,3))
[[1]]
[1] 0 0 0
[[2]]
[1] 0 0 1
[[3]]
[1] 0 1 0
[[4]]
[1] 0 1 1
[[5]]
[1] 1 0 0
[[6]]
[1] 1 0 1
[[7]]
[1] 1 1 0
[[8]]
[1] 1 1 1
> sapply(0:(2^3 - 1), function(x) MyIntToBit(x,3))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 0 0 0 0 1 1 1 1
[2,] 0 0 1 1 0 0 1 1
[3,] 0 1 0 1 0 1 0 1
This is a different approach to the previous answers. If you need all possible combinations of 14 values of 1 and 0, it's like generating all possible numbers from 0 to (2^14)-1 and keeping the binary representation of them.
n <- 14
lapply(0:(2^n-1), FUN=function(x) head(as.integer(intToBits(x)),n))
Preface
Many nice answers here. I want to add one for those of us that can't seem to wrap their heads around the provided implementations. The solutions here are essentially generalizations of loops, which is why recursive solutions look so elegant. No one outright wrote it as a loop--I think there are merits to giving the most straight-forward solution, just to trace out what's actually happening.
This is not guaranteed to have great performance--and most of the other answers are more practical. The purpose is to allow you to trace out what's actually happening.
The Math
A combination is all the unique selections of a set in which the order of the elements do not matter ([0, 1] is different from [1, 0]). Your list has n elements and you are selecting k elements, for a total number of combinations n^k.
Ex.
You have three letters, ['a', 'b', 'c'] and want to find all unique ways to arrange two of these letters, allowing letters to be pulled repeatedly (so ['a', 'a'] is allowed). n = 3 and k = 2--we have three things and want to find all of the different ways to pick two of them. There are 9 ways to make this selection (3^2--->n^k).
The Code
As mentioned, the simplest solution requires a whole lotta loops.
Keep adding loops and values to select from as your value of k increases.
set <- c("a", "b", "c")
n <- length(set)
# k = 1
# There are only three ways to pick one thing from a selection of three items!
sprintf("Number of combinations:%4d", n^1)
for(i in seq_along(set)){
print(paste(set[i]))
}
# k = 2
sprintf("Number of combinations:%4d", n^2)
for(i in seq_along(set)){
for(j in seq_along(set)){
print(paste(set[i], set[j]))
}
}
# k = 3
sprintf("Number of combinations:%4d", n^3)
for(i in seq_along(set)){
for(j in seq_along(set)){
for(k in seq_along(set)){
print(paste(set[i], set[j], set[k]))
}
}
}
# See the pattern? The value of k corresponds
# to the number of loops and to the number of
# indexes on `set`
A purrr solution with cross() and its variant:
library(purrr)
cross(list(0:1, 0:1, 0:1)) %>% simplify_all()
# [[1]]
# [1] 0 0 0
#
# [[2]]
# [1] 1 0 0
#
# [[3]]
# [1] 0 1 0
#
# ...
#
# [[8]]
# [1] 1 1 1
cross_df(list(var1 = 0:1, var2 = 0:1, var3 = 0:1))
# # A tibble: 8 × 3
# var1 var2 var3
# <int> <int> <int>
# 1 0 0 0
# 2 1 0 0
# 3 0 1 0
# 4 1 1 0
# 5 0 0 1
# 6 1 0 1
# 7 0 1 1
# 8 1 1 1
With dplyr, you could use full_join(x, y, by = character()) to perform a cross-join, generating all combinations of x and y.
Reduce(\(x, y) full_join(x, y, by = character()),
list(tibble(var1 = 0:1), tibble(var2 = 0:1), tibble(var3 = 0:1)))
# # A tibble: 8 × 3
# var1 var2 var3
# <int> <int> <int>
# 1 0 0 0
# 2 0 0 1
# 3 0 1 0
# 4 0 1 1
# 5 1 0 0
# 6 1 0 1
# 7 1 1 0
# 8 1 1 1
A beautiful minimal reproducible example here:
x <- c("red", "blue", "black")
do.call(c, lapply(seq_along(x), combn, x = x, simplify = FALSE))
# [[1]]
# [1] "red"
#
# [[2]]
# [1] "blue"
#
# [[3]]
# [1] "black"
#
# [[4]]
# [1] "red" "blue"
#
# [[5]]
# [1] "red" "black"
#
# [[6]]
# [1] "blue" "black"
#
# [[7]]
# [1] "red" "blue" "black"
All credit goes to #RichScriven

Resources