Convert 3D array to tidy data frame? - r

I have a 3D array that looks like this:
# Create two vectors
vector1 <- c(1,2,3,4,5,6)
vector2 <- c(10, 11, 12, 13, 14, 15,16)
# Convert to 3D array
my_array <- array(c(vector1, vector2), dim = c(2,3,2))
print(my_array)
where the output is
, , 1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
, , 2
[,1] [,2] [,3]
[1,] 10 12 14
[2,] 11 13 15
I would like to turn this into a tidy dataset, where is one row per value, and there are 4 columns for each of the values:
the value itself
dimension 1
dimension 2
dimension 3
so for example, a few rows would be
Value Dimension1(Row) Dimension2(Column) Dimension3(Width)
1 1 1 1
2 2 1 1
...
15 2 3 2
Is there a good way to do this in base R, or with tidyverse tools like tidyr?

We could use reshape2::melt
library(reshape2)
melt(my_array)
-output
Var1 Var2 Var3 value
1 1 1 1 1
2 2 1 1 2
3 1 2 1 3
4 2 2 1 4
5 1 3 1 5
6 2 3 1 6
7 1 1 2 10
8 2 1 2 11
9 1 2 2 12
10 2 2 2 13
11 1 3 2 14
12 2 3 2 15
Or use as.data.frame.table in base R
as.data.frame.table(my_array)
Or may also use
cbind(which(is.finite(my_array), arr.ind = TRUE), value = c(my_array))

Related

expand_grid with identical vectors

Problem:
Is there a simple way to get all combinations of two (or more) identical vectors. But only show unique combinations.
Reproducible example:
library(tidyr)
x = 1:3
expand_grid(a = x,
b = x,
c = x)
# A tibble: 27 x 3
a b c
<int> <int> <int>
1 1 1 1
2 1 1 2
3 1 1 3
4 1 2 1
5 1 2 2
6 1 2 3
7 1 3 1
8 1 3 2
9 1 3 3
10 2 1 1
# ... with 17 more rows
But, if row 1 2 1 exists, then I do not want to see 1 1 2 or 2 1 1. I.e. show only unique combinations of the three vectors (any order).
library(gtools)
x = 1:3
df <- as.data.frame(combinations(n=3,r=3,v=x,repeats.allowed=T))
df
output
V1 V2 V3
1 1 1 1
2 1 1 2
3 1 1 3
4 1 2 2
5 1 2 3
6 1 3 3
7 2 2 2
8 2 2 3
9 2 3 3
10 3 3 3
You can just sort rowwise and remove duplicates. Continuing from your expand_grid(), then
df <- tidyr::expand_grid(a = x,
b = x,
c = x)
data.frame(unique(t(apply(df, 1, sort))))
X1 X2 X3
1 1 1 1
2 1 1 2
3 1 1 3
4 1 2 2
5 1 2 3
6 1 3 3
7 2 2 2
8 2 2 3
9 2 3 3
10 3 3 3
Using comboGeneral from the RcppAlgos package, it's implemented in C++ and pretty fast.
x <- 1:3
RcppAlgos::comboGeneral(x, repetition=TRUE)
# [,1] [,2] [,3]
# [1,] 1 1 1
# [2,] 1 1 2
# [3,] 1 1 3
# [4,] 1 2 2
# [5,] 1 2 3
# [6,] 1 3 3
# [7,] 2 2 2
# [8,] 2 2 3
# [9,] 2 3 3
# [10,] 3 3 3
Note: If you're running Linux, you will need gmp installed, e.g. for Ubuntu do:
sudo apt install libgmp3-dev
base
x <- 1:3
df <- expand.grid(a = x,
b = x,
c = x)
df[!duplicated(apply(df, 1, function(x) paste(sort(x), collapse = ""))), ]
#> a b c
#> 1 1 1 1
#> 2 2 1 1
#> 3 3 1 1
#> 5 2 2 1
#> 6 3 2 1
#> 9 3 3 1
#> 14 2 2 2
#> 15 3 2 2
#> 18 3 3 2
#> 27 3 3 3
Created on 2021-09-09 by the reprex package (v2.0.1)

melt the lower half from systematic matrix in R

Given that I have a three by three systematic matrix.
> x<-matrix(1:9,3)
> x[lower.tri(x)] = t(x)[lower.tri(x)]
> x
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 4 5 8
[3,] 7 8 9
Then I apply library reshape2 to make it in long-format.
> library(reshape2)
> x <- melt(x)
> x
Var1 Var2 value
1 1 1 1
2 2 1 4
3 3 1 7
4 1 2 4
5 2 2 5
6 3 2 8
7 1 3 7
8 2 3 8
9 3 3 9
As the upper diagonal and bottom diagonal are identical, I only need half of result, which will look like below.
Var1 Var2 value
1 1 1
2 1 4
3 1 7
2 2 5
3 2 8
3 3 9
Any elegant approach to do this?
You can change the values for the bottom or upper half to NA, and then melt ignoring missing values, assume there are not missing values in the matrix originally or you don't need to keep them in the result if there are:
x[upper.tri(x)] = NA
reshape2::melt(x, na.rm=T)
# Var1 Var2 value
#1 1 1 1
#2 2 1 4
#3 3 1 7
#5 2 2 5
#6 3 2 8
#9 3 3 9
As the 'x' was already assigned and melted, we can get a logical index of the non-duplicate rows after sorting the subset of dataset with 1st and 2nd column by row and then use it to subset the rows
x[!duplicated(t(apply(x[1:2], 1, sort))),]
# Var1 Var2 value
#1 1 1 1
#2 2 1 4
#3 3 1 7
#5 2 2 5
#6 3 2 8
#9 3 3 9

Abnormal Sequencing in R

I would like to create a vector of sequenced numbers such as:
1,2,3,4,5, 2,3,4,5,1, 3,4,5,1,2
Whereby after a sequence is complete (say, rep(seq(1,5),3)), the first number of the previous sequence now moves to the last spot in the sequence.
%% to modulo?
(1:5) %% 5 + 1 # left shift by 1
[1] 2 3 4 5 1
(1:5 + 1) %% 5 + 1 # left shift by 2
[1] 3 4 5 1 2
also try
(1:5 - 2) %% 5 + 1 # right shift by 1
[1] 5 1 2 3 4
(1:5 - 3) %% 5 + 1 # right shift by 2
[1] 4 5 1 2 3
I would start off by making a matrix of one column longer than the length of the series.
> lseries <- 5
> nreps <- 3
> (values <- matrix(1:lseries, nrow = lseries + 1, ncol = nreps))
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 3 4
[3,] 3 4 5
[4,] 4 5 1
[5,] 5 1 2
[6,] 1 2 3
This may throw a warning (In matrix(1:lseries, nrow = lseries + 1, ncol = nreps) : data length [5] is not a sub-multiple or multiple of the number of rows [6]) which you can ignore. Note, the first 1:lseries rows have the data you want. We can get the final result using:
> as.vector(values[1:lseries, ])
[1] 1 2 3 4 5 2 3 4 5 1 3 4 5 1 2
Here's method to get a matrix of each of these
matrix(1:5, 5, 6, byrow=TRUE)[, -6]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 2 3 4 5 1
[3,] 3 4 5 1 2
[4,] 4 5 1 2 3
[5,] 5 1 2 3 4
or turn it into a list
split.default(matrix(1:5, 5, 6, byrow=TRUE)[, -6], 1:5)
$`1`
[1] 1 2 3 4 5
$`2`
[1] 2 3 4 5 1
$`3`
[1] 3 4 5 1 2
$`4`
[1] 4 5 1 2 3
$`5`
[1] 5 1 2 3 4
or into a vector with c
c(matrix(1:5, 5, 6, byrow=TRUE)[, -6])
[1] 1 2 3 4 5 2 3 4 5 1 3 4 5 1 2 4 5 1 2 3 5 1 2 3 4
For the sake of variety, here is a second method to return the vector:
# construct the larger vector
temp <- rep(1:5, 6)
# use sapply with which to pull off matching positions, then take select position to drop
temp[-sapply(1:5, function(x) which(temp == x)[x+1])]
[1] 1 2 3 4 5 2 3 4 5 1 3 4 5 1 2 4 5 1 2 3 5 1 2 3 4

R merging with a preference

Suppose you have a matrix that consists of two columns of only 1's and 2's.
A B
1 2
2 2
1 1
2 1
2 1
2 2
2 1
How would you merge these two columns into one so that 2 always overwrites 1?
Desired Output:
C
2
2
1
2
2
2
2
Assuming that the data is stored in a dataframe named df, you can use
df$C <- pmax(df$A, df$B)
to create a new column C with the desired result.
In the case of a matrix m you can use
m <- cbind(m, pmax(m[,1], m[,2]))
colnames(m) <- LETTERS[1:ncol(m)]
#> m
# A B C
#[1,] 1 2 2
#[2,] 2 2 2
#[3,] 1 1 1
#[4,] 2 1 2
#[5,] 2 1 2
#[6,] 2 2 2
#[7,] 2 1 2
#> class(m)
#[1] "matrix"
Without ifelse:
df$C <- apply(df[,c("A","B")],1,max)
With ifelse:
df$C2 <- with(df, ifelse(A==1&B==1,1,2))
Result
> df
A B C1 C2
1 1 2 2 2
2 2 2 2 2
3 1 1 1 1
4 2 1 2 2
5 2 1 2 2
6 2 2 2 2
7 2 1 2 2

convert rows after column

I have csv file which reads like this
1 5
2 3
3 2
4 6
5 3
6 7
7 2
8 1
9 1
What I want to do is to this:
1 5 4 6 7 2
2 3 5 3 8 1
3 2 6 7 9 1
i.e after every third row, I want a different column of the values side by side. Any advise?
Thanks a lot
Here's a way to do this with matrix indexing. It's a bit strange, but I find it interesting so I will post it.
You want an index matrix, with indices as follows. This gives the order of your data as a matrix (column-major order):
1, 1
2, 1
3, 1
1, 2
2, 2
3, 2
4, 1
...
8, 2
9, 2
This gives the pattern that you need to select the elements. Here's one approach to building such a matrix. Say that your data is in the object dat, a data frame or matrix:
m <- matrix(
c(
outer(rep(1:3, 2), seq(0,nrow(dat)-1,by=3), FUN='+'),
rep(rep(1:2, each=3), nrow(dat)/3)
),
ncol=2
)
The outer expression is the first column of the desired index matrix, and the rep expression is the second column. Now just index dat with this index matrix, and build a result matrix with three rows:
matrix(dat[m], nrow=3)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 5 4 6 7 2
## [2,] 2 3 5 3 8 1
## [3,] 3 2 6 7 9 1
a <- read.table(text = "1 5
2 3
3 2
4 6
5 3
6 7
7 2
8 1
9 1")
(seq_len(nrow(a))-1) %/% 3
# [1] 0 0 0 1 1 1 2 2 2
split(a, (seq_len(nrow(a))-1) %/% 3)
# $`0`
# V1 V2
# 1 1 5
# 2 2 3
# 3 3 2
# $`1`
# V1 V2
# 4 4 6
# 5 5 3
# 6 6 7
# $`2`
# V1 V2
# 7 7 2
# 8 8 1
# 9 9 1
do.call(cbind,split(a, (seq_len(nrow(a))-1) %/% 3))
# 0.V1 0.V2 1.V1 1.V2 2.V1 2.V2
# 1 1 5 4 6 7 2
# 2 2 3 5 3 8 1
# 3 3 2 6 7 9 1

Resources