R How to permute all rows of a data frame such that all possible combinations of rows are returned in a list? - r

I'm trying to produce all possible row permutations of a data frame (or matrix if that's easier) and have an object returned as a list or array of the data frames/matrices. I've constructed a mock dataframe that as the same dimensions as the one I'm working with.
test.df <- as.data.frame(matrix(1:80,nrow=16,ncol=5)
Edit: changed combinations to permutations

v.df <- data.frame(symbol = c("a", "b", "c"), number = c(1,2,3))
v.df
## symbol number
## 1 a 1
## 2 b 2
## 3 c 3
permutate.rows <- function(df) {
k <- dim(df)[1] # number of rows
index.df <- as.data.frame(t(permutations(n = k, r = k, v = 1:k)))
res <- lapply(index.df, function(idx) df[idx, , drop = FALSE])
}
permutate.rows(v.df)
gives the list of all permutated dfs:
$V1
symbol number
1 a 1
2 b 2
3 c 3
$V2
symbol number
1 a 1
3 c 3
2 b 2
$V3
symbol number
2 b 2
1 a 1
3 c 3
$V4
symbol number
2 b 2
3 c 3
1 a 1
$V5
symbol number
3 c 3
1 a 1
2 b 2
$V6
symbol number
3 c 3
2 b 2
1 a 1
Use 16 instead of 3 and your data frame to apply it on your example.

I shortened the df because 16!=20922789888000
library(purrr)
library(combinat)
test.df <- as.data.frame(matrix(1:25,nrow=5,ncol=5))
map(permn(1:nrow(test.df)), function(x) test.df[x,])

Related

R: all combinations of nested list of variable length [duplicate]

I'm not sure if permutations is the correct word for this. I want to given a set of n vectors (i.e. [1,2],[3,4] and [2,3]) permute them all and get an output of
[1,3,2],[1,3,3],[1,4,2],[1,4,3],[2,3,2] etc.
Is there an operation in R that will do this?
This is a useful case for storing the vectors in a list and using do.call() to arrange for an appropriate function call for you. expand.grid() is the standard function you want. But so you don't have to type out or name individual vectors, try:
> l <- list(a = 1:2, b = 3:4, c = 2:3)
> do.call(expand.grid, l)
a b c
1 1 3 2
2 2 3 2
3 1 4 2
4 2 4 2
5 1 3 3
6 2 3 3
7 1 4 3
8 2 4 3
However, for all my cleverness, it turns out that expand.grid() accepts a list:
> expand.grid(l)
a b c
1 1 3 2
2 2 3 2
3 1 4 2
4 2 4 2
5 1 3 3
6 2 3 3
7 1 4 3
8 2 4 3
This is what expand.grid does.
Quoting from the help page: Create a data frame from all combinations of the supplied vectors or factors. The result is a data.frame with a row for each combination.
expand.grid(
c(1, 2),
c(3, 4),
c(2, 3)
)
Var1 Var2 Var3
1 1 3 2
2 2 3 2
3 1 4 2
4 2 4 2
5 1 3 3
6 2 3 3
7 1 4 3
8 2 4 3
As an alternative to expand.grid() you could use rep() to produce the desired combination. Consider the following simplified example using the original data from this question:
a <- c(1,2)
b <- c(3,4)
c <- c(2,3)
To get the expand.grid()-like effect, use rep() with a times= argument equal to the product of the length of the other vectors (or 4). The middle vector would use a nested rep() with products of vector length to either side (or 2 and 2). The end vector is like the first but with each= argument in order to pattern correctly. This is trivial to calculate when each vector is length of 2. Example:
#tibble of all combinations of a, b and c
tibble::tibble(
var1 = rep(a, times = 4),
var2 = rep(rep(b, each= 2), times = 2), #nested rep()
var3 = rep(c, each= 4)
)
For an unknown number of input vectors (or unknown vector lengths), we can get all combinations with rep() in a function like this:
#Produces a tibble of all combinations of input vectors
expand_tibble <- function(...){
x <- list(...) #all input vectors stored here
l <- lapply(x,length)|> unlist() #vector showing length of each input vector
t <- length(l) #total input vector count
r <-list() #empty list
for(i in 1:t){
if(i==1){ #first input vector
first <-l[2:length(l)] |> prod()
r[[i]]<-rep(x[[i]], each = first)
}else{ #last input vector
if(i==t){
last <- l[1:t-1] |> prod()
r[[i]]<-rep(x[[i]], last)
}else{ #all middle input vectors
m1 <- l[1:(i-1)] |> prod()
m2 <- l[(i+1):t] |> prod()
r[[i]] <- rep(rep(x[[i]], each=m1),m2)
}
}
names(r)[i]<-paste0("var",i)
}
tibble::as_tibble(r)
}
output:
expand_tibble(a,b,c)
var1 var2 var3
<dbl> <dbl> <dbl>
1 1 3 2
2 1 3 3
3 1 4 2
4 1 4 3
5 2 3 2
6 2 3 3
7 2 4 2
8 2 4 3

How to get permutations by selecting one member of a subset with multiple subsets in R? [duplicate]

I'm not sure if permutations is the correct word for this. I want to given a set of n vectors (i.e. [1,2],[3,4] and [2,3]) permute them all and get an output of
[1,3,2],[1,3,3],[1,4,2],[1,4,3],[2,3,2] etc.
Is there an operation in R that will do this?
This is a useful case for storing the vectors in a list and using do.call() to arrange for an appropriate function call for you. expand.grid() is the standard function you want. But so you don't have to type out or name individual vectors, try:
> l <- list(a = 1:2, b = 3:4, c = 2:3)
> do.call(expand.grid, l)
a b c
1 1 3 2
2 2 3 2
3 1 4 2
4 2 4 2
5 1 3 3
6 2 3 3
7 1 4 3
8 2 4 3
However, for all my cleverness, it turns out that expand.grid() accepts a list:
> expand.grid(l)
a b c
1 1 3 2
2 2 3 2
3 1 4 2
4 2 4 2
5 1 3 3
6 2 3 3
7 1 4 3
8 2 4 3
This is what expand.grid does.
Quoting from the help page: Create a data frame from all combinations of the supplied vectors or factors. The result is a data.frame with a row for each combination.
expand.grid(
c(1, 2),
c(3, 4),
c(2, 3)
)
Var1 Var2 Var3
1 1 3 2
2 2 3 2
3 1 4 2
4 2 4 2
5 1 3 3
6 2 3 3
7 1 4 3
8 2 4 3
As an alternative to expand.grid() you could use rep() to produce the desired combination. Consider the following simplified example using the original data from this question:
a <- c(1,2)
b <- c(3,4)
c <- c(2,3)
To get the expand.grid()-like effect, use rep() with a times= argument equal to the product of the length of the other vectors (or 4). The middle vector would use a nested rep() with products of vector length to either side (or 2 and 2). The end vector is like the first but with each= argument in order to pattern correctly. This is trivial to calculate when each vector is length of 2. Example:
#tibble of all combinations of a, b and c
tibble::tibble(
var1 = rep(a, times = 4),
var2 = rep(rep(b, each= 2), times = 2), #nested rep()
var3 = rep(c, each= 4)
)
For an unknown number of input vectors (or unknown vector lengths), we can get all combinations with rep() in a function like this:
#Produces a tibble of all combinations of input vectors
expand_tibble <- function(...){
x <- list(...) #all input vectors stored here
l <- lapply(x,length)|> unlist() #vector showing length of each input vector
t <- length(l) #total input vector count
r <-list() #empty list
for(i in 1:t){
if(i==1){ #first input vector
first <-l[2:length(l)] |> prod()
r[[i]]<-rep(x[[i]], each = first)
}else{ #last input vector
if(i==t){
last <- l[1:t-1] |> prod()
r[[i]]<-rep(x[[i]], last)
}else{ #all middle input vectors
m1 <- l[1:(i-1)] |> prod()
m2 <- l[(i+1):t] |> prod()
r[[i]] <- rep(rep(x[[i]], each=m1),m2)
}
}
names(r)[i]<-paste0("var",i)
}
tibble::as_tibble(r)
}
output:
expand_tibble(a,b,c)
var1 var2 var3
<dbl> <dbl> <dbl>
1 1 3 2
2 1 3 3
3 1 4 2
4 1 4 3
5 2 3 2
6 2 3 3
7 2 4 2
8 2 4 3

How can I loop a data matrix in R?

I am trying to loop a data matrix for each separate ID tag, “1”, “2” and “3” (see my data at the bottom). Ultimately I am doing this to transform the X and Y coordinates into a timeseries with the ts() function, but first i need to build a loop into the function that returns a timeseries for each separate ID. The looping itself works perfectly fine when I use the following code for a dataframe:
for(i in 1:3){
print(na.omit(xyframe[ID==i,]))
}
Returning the following output:
Timestamp X Y ID
1. 0 -34.012 3.406 1
2. 100 -33.995 3.415 1
3. 200 -33.994 3.427 1
Timestamp X Y ID
4. 0 -34.093 3.476 2
5. 100 -34.145 3.492 2
6. 200 -34.195 3.506 2
Timestamp X Y ID
7. 0 -34.289 3.522 3
8. 100 -34.300 3.520 3
9. 200 -34.303 3.517 3
Yet, when I want to produce a loop in a matrix with the same code:
for(i in 1:3){
print(na.omit(xymatrix[ID==i,])
}
It returns the following error:
Error in print(na.omit(xymatrix[ID == i, ]) :
(subscript) logical subscript too long
Why does it not work to loop the ID through a matrix while it does work for the dataframe and how would I be able to fix it?
Furthermore did I read that looping requires much more computational strength then doing the same thing vector based, would there be a way to do this vector based?
The data (simplification of the real data):
Timestamp X Y ID
1. 0 -34.012 3.406 1
2. 100 -33.995 3.415 1
3. 200 -33.994 3.427 1
4. 0 -34.093 3.476 2
5. 100 -34.145 3.492 2
6. 200 -34.195 3.506 2
7. 0 -34.289 3.522 3
8. 100 -34.300 3.520 3
9. 200 -34.303 3.517 3
The format xymatrix[ID==i,] doesn't work for matrix. Try this way:
for(i in 1:3){ print(na.omit(xymatrix[xymatrix[,'ID'] == i,])) }
In general, if you want to apply a function to a data frame, split by some factor, then you should be using one of the apply family of functions in combination with split.
Here's some reproducible sample data.
n <- 20
some_data <- data.frame(
x = sample(c(1:5, NA), n, replace= TRUE),
y = sample(c(letters[1:5], NA), n, replace= TRUE),
id = gl(3, 1, length = n)
)
If you want to print out the rows with no missing values, split by each ID level, then you want something like this.
lapply(split(some_data, some_data$grp), na.omit)
or more concisely using the plyr package.
library(plyr)
dlply(some_data, .(grp), na.omit)
Both methods return output like this
# $`1`
# x y grp
# 1 2 d 1
# 4 3 e 1
# 7 3 c 1
# 10 4 a 1
# 13 2 e 1
# 16 3 a 1
# 19 1 d 1
# $`2`
# x y grp
# 2 1 e 2
# 5 3 e 2
# 8 3 b 2
# $`3`
# x y grp
# 6 3 c 3
# 9 5 a 3
# 12 2 c 3
# 15 2 d 3
# 18 4 a 3

Convert a matrix with dimnames into a long format data.frame

Hoping there's a simple answer here but I can't find it anywhere.
I have a numeric matrix with row names and column names:
# 1 2 3 4
# a 6 7 8 9
# b 8 7 5 7
# c 8 5 4 1
# d 1 6 3 2
I want to melt the matrix to a long format, with the values in one column and matrix row and column names in one column each. The result could be a data.table or data.frame like this:
# col row value
# 1 a 6
# 1 b 8
# 1 c 8
# 1 d 1
# 2 a 7
# 2 c 5
# 2 d 6
...
Any tips appreciated.
Use melt from reshape2:
library(reshape2)
#Fake data
x <- matrix(1:12, ncol = 3)
colnames(x) <- letters[1:3]
rownames(x) <- 1:4
x.m <- melt(x)
x.m
Var1 Var2 value
1 1 a 1
2 2 a 2
3 3 a 3
4 4 a 4
...
The as.table and as.data.frame functions together will do this:
> m <- matrix( sample(1:12), nrow=4 )
> dimnames(m) <- list( One=letters[1:4], Two=LETTERS[1:3] )
> as.data.frame( as.table(m) )
One Two Freq
1 a A 7
2 b A 2
3 c A 1
4 d A 5
5 a B 9
6 b B 6
7 c B 8
8 d B 10
9 a C 11
10 b C 12
11 c C 3
12 d C 4
Assuming 'm' is your matrix...
data.frame(col = rep(colnames(m), each = nrow(m)),
row = rep(rownames(m), ncol(m)),
value = as.vector(m))
This executes extremely fast on a large matrix and also shows you a bit about how a matrix is made, how to access things in it, and how to construct your own vectors.
A modification that doesn't require you to know anything about the storage structure, and that easily extends to high dimensional arrays if you use the dimnames, and slice.index functions:
data.frame(row=rownames(m)[as.vector(row(m))],
col=colnames(m)[as.vector(col(m))],
value=as.vector(m))

Combinations of multiple vectors in R

I'm not sure if permutations is the correct word for this. I want to given a set of n vectors (i.e. [1,2],[3,4] and [2,3]) permute them all and get an output of
[1,3,2],[1,3,3],[1,4,2],[1,4,3],[2,3,2] etc.
Is there an operation in R that will do this?
This is a useful case for storing the vectors in a list and using do.call() to arrange for an appropriate function call for you. expand.grid() is the standard function you want. But so you don't have to type out or name individual vectors, try:
> l <- list(a = 1:2, b = 3:4, c = 2:3)
> do.call(expand.grid, l)
a b c
1 1 3 2
2 2 3 2
3 1 4 2
4 2 4 2
5 1 3 3
6 2 3 3
7 1 4 3
8 2 4 3
However, for all my cleverness, it turns out that expand.grid() accepts a list:
> expand.grid(l)
a b c
1 1 3 2
2 2 3 2
3 1 4 2
4 2 4 2
5 1 3 3
6 2 3 3
7 1 4 3
8 2 4 3
This is what expand.grid does.
Quoting from the help page: Create a data frame from all combinations of the supplied vectors or factors. The result is a data.frame with a row for each combination.
expand.grid(
c(1, 2),
c(3, 4),
c(2, 3)
)
Var1 Var2 Var3
1 1 3 2
2 2 3 2
3 1 4 2
4 2 4 2
5 1 3 3
6 2 3 3
7 1 4 3
8 2 4 3
As an alternative to expand.grid() you could use rep() to produce the desired combination. Consider the following simplified example using the original data from this question:
a <- c(1,2)
b <- c(3,4)
c <- c(2,3)
To get the expand.grid()-like effect, use rep() with a times= argument equal to the product of the length of the other vectors (or 4). The middle vector would use a nested rep() with products of vector length to either side (or 2 and 2). The end vector is like the first but with each= argument in order to pattern correctly. This is trivial to calculate when each vector is length of 2. Example:
#tibble of all combinations of a, b and c
tibble::tibble(
var1 = rep(a, times = 4),
var2 = rep(rep(b, each= 2), times = 2), #nested rep()
var3 = rep(c, each= 4)
)
For an unknown number of input vectors (or unknown vector lengths), we can get all combinations with rep() in a function like this:
#Produces a tibble of all combinations of input vectors
expand_tibble <- function(...){
x <- list(...) #all input vectors stored here
l <- lapply(x,length)|> unlist() #vector showing length of each input vector
t <- length(l) #total input vector count
r <-list() #empty list
for(i in 1:t){
if(i==1){ #first input vector
first <-l[2:length(l)] |> prod()
r[[i]]<-rep(x[[i]], each = first)
}else{ #last input vector
if(i==t){
last <- l[1:t-1] |> prod()
r[[i]]<-rep(x[[i]], last)
}else{ #all middle input vectors
m1 <- l[1:(i-1)] |> prod()
m2 <- l[(i+1):t] |> prod()
r[[i]] <- rep(rep(x[[i]], each=m1),m2)
}
}
names(r)[i]<-paste0("var",i)
}
tibble::as_tibble(r)
}
output:
expand_tibble(a,b,c)
var1 var2 var3
<dbl> <dbl> <dbl>
1 1 3 2
2 1 3 3
3 1 4 2
4 1 4 3
5 2 3 2
6 2 3 3
7 2 4 2
8 2 4 3

Resources