R Inserting a Dataframe/List into a Dataframe Element - r

I'd like to insert a dataframe into a dataframe element, such that if I called:df1[1,1] I would get:
[A B]
[C D]
I thought this was possible in R but perhaps I am mistaken. In a project of mine, I am essentially working with a 50x50 matrix, where I'd like each element to contain column of data containing numbers and labeled rows.
Trying to do something like df1[1,1] <- df2 yields the following warning
Warning message:
In [<-.data.frame(*tmp*, i, j, value = list(DJN.10 = c(0, 3, :
replacement element 1 has 144 rows to replace 1 rows
And calling df1[1,1] yields 0 . I've tried inserting the data in various ways, as with as.vector() and as.list() to no success.
Best,

Perhaps a matrix could work for you, like so:
x <- matrix(list(), nrow=2, ncol=3)
print(x)
# [,1] [,2] [,3]
#[1,] NULL NULL NULL
#[2,] NULL NULL NULL
x[[1,1]] <- data.frame(a=c("A","C"), b=c("B","D"))
x[[1,2]] <- data.frame(c=2:3)
x[[2,3]] <- data.frame(x=1, y=2:4)
x[[2,1]] <- list(1,2,3,5)
x[[1,3]] <- list("a","b","c","d")
x[[2,2]] <- list(1:5)
print(x)
# [,1] [,2] [,3]
#[1,] List,2 List,1 List,4
#[2,] List,4 List,1 List,2
x[[1,1]]
# a b
#1 A B
#2 C D
class(x)
#[1] "matrix"
typeof(x)
#[1] "list"
See here for details.

Each column in your data.frame can be a list. Just make sure that the list is as long as the number of rows in your data.frame.
Columns can be added using the standard $ notation.
Example:
x <- data.frame(matrix(NA, nrow=2, ncol=3))
x$X1 <- I(list(data.frame(a=c("A","C"), b=c("B","D")), matrix(1:10, ncol = 5)))
x$X2 <- I(list(data.frame(c = 2:3), list(1, 2, 3, 4)))
x$X3 <- I(list(list("a", "b", "c"), 1:5))
x
# X1 X2 X3
# 1 1:2, 1:2 2:3 a, b, c
# 2 1, 2, 3,.... 1, 2, 3, 4 1, 2, 3,....
x[1, 1]
# [[1]]
# a b
# 1 A B
# 2 C D
#
x[2, 1]
# [[1]]
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 3 5 7 9
# [2,] 2 4 6 8 10

Related

Matching matrix rows with list elements in R

A reproducible data:
dat1 <- matrix(0, nrow = 9, ncol = 2)
dat1[,1] <- rep(1:3,3)
dat1[,2] <- c(1,1,1,2,2,2,3,3,3)
dat2 <- list()
dat2[[1]] <- matrix(c(1,2,1,3), nrow = 2, ncol = 2)
dat2[[2]] <- matrix(c(1,1,2,3,1,3), nrow = 3, ncol = 2 )
> dat1
[,1] [,2]
[1,] 1 1
[2,] 2 1
[3,] 3 1
[4,] 1 2
[5,] 2 2
[6,] 3 2
[7,] 1 3
[8,] 2 3
[9,] 3 3
> dat2
[[1]]
[,1] [,2]
[1,] 1 1
[2,] 2 3
[[2]]
[,1] [,2]
[1,] 1 3
[2,] 1 1
[3,] 2 3
I have a matrix (dat1) and a list (dat2).
Some rows of dat1 is same as some of the list elements of dat2. My objective is to find out the corresponding row numbers of dat1 that are matched with dat2 and store them in a list. AN EXAMPLE of the output:
> ex.result
[[1]]
[,1]
[1,] 1
[2,] 8
[[2]]
[,1]
[1,] 7
[2,] 1
[3,] 8
I am looking for a fast way to do this without using time consuming loops.
A slightly different approach:
lapply( dat2, function(m) {
apply( m, 1, function(r)
which( apply( sweep( dat1, 2, r, "==" ), 1, all ) ) ) %>% as.matrix })
Output:
[[1]]
[,1]
[1,] 1
[2,] 8
[[2]]
[,1]
[1,] 7
[2,] 1
[3,] 8
Here is an option:
lapply(dat2, function(mat)
apply(mat, 1, function(row)
match(toString(row), apply(dat1, 1, toString))))
#[[1]]
#[1] 1 8
#
#[[2]]
#[1] 7 1 8
This returns a list with integer vectors instead of a list with array/matrix entries though.
In the same vein as above, using Map() and vector recycling:
# Coercing to a data.frame to recycle the vector that is used to search:
setNames(
Map(function(x, y){
matrix(
match(
apply(y, 1, paste, collapse = ", "),
x
)
)
},
data.frame(apply(dat1, 1, paste, collapse = ", ")),
dat2),
seq_len(length(dat2)))

Create a symmetric matrix from circular shifts of a vector

I'm struggling with the creation of a symmetric matrix.
Let's say a vector v <- c(1,2,3)
I want to create a matrix like this:
matrix(ncol = 3, nrow = 3, c(1,2,3,2,3,1,3,1,2), byrow = FALSE)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 3 1
[3,] 3 1 2
(This is just an reprex, I have many vectors with different lengths.)
Notice this is a symmetric matrix with diagonal c(1,3,2) (different from vector v) and the manual process to create the matrix would be like this:
Using the first row as base (vector v) the process is to fill the empty spaces with the remaining values on the left side.
Any help is appreciated. Thanks!
Let me answer my own question in order to close it properly, using the incredible simple and easy solution from Henrik's comment:
matrix(v, nrow = 3, ncol = 4, byrow = TRUE)[ , 1:3]
Maybe the byrow = TRUE matches the three steps of the illustration best conceptually, but the output is the same with:
matrix(v, nrow = 4, ncol = 3)[1:3, ]
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 2 3 1
# [3,] 3 1 2
Because there may be "many vectors with different lengths", it could be convenient to make a simple function and apply it to the vectors stored in a list:
cycle = function(x){
len = length(x)
matrix(x, nrow = len + 1, ncol = len)[1:len , ]
}
l = list(v1 = 1:3, v2 = letters[1:4])
lapply(l, cycle)
# $v1
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 2 3 1
# [3,] 3 1 2
#
# $v2
# [,1] [,2] [,3] [,4]
# [1,] "a" "b" "c" "d"
# [2,] "b" "c" "d" "a"
# [3,] "c" "d" "a" "b"
# [4,] "d" "a" "b" "c"
Another option is to use Reduce and make c(v[-1], v[1]) accumulative.
do.call(rbind, Reduce(function(x, y) c(x[-1], x[1]), v[-1], v, accumulate = TRUE))
# [,1] [,2] [,3]
#[1,] 1 2 3
#[2,] 2 3 1
#[3,] 3 1 2

Find common values between matrices and return matrix with row-col position

I'd like to find between to matrices the shared values, and return the locations (row-col) in a matrix.
set.seed(123)
m <- matrix(sample(4), 2, 2, byrow = T)
# m
# [,1] [,2]
# [1,] 2 3
# [2,] 1 4
m2 <- matrix(sample(4), 2, 2, byrow = F)
# m2
# [,1] [,2]
# [1,] 4 2
# [2,] 1 3
Expected output:
# [,1] [,2]
# [1,] NA NA
# [2,] "2-1" NA
Bonus if this could be generalized to non-identical matrices (different dim).
Equal sizes
One option would be
replace(m * NA, m == m2, paste(row(m), col(m), sep = "-")[m == m2])
# [,1] [,2]
# [1,] NA NA
# [2,] "2-1" NA
Different sizes
I believe that in this case, regardless of the approach, you will first need to trim both matrices to be of equal size.
set.seed(12)
(m <- matrix(sample(6), 2, 3, byrow = TRUE))
# [,1] [,2] [,3]
# [1,] 1 5 4
# [2,] 6 3 2
(m2 <- matrix(sample(6), 3, 2, byrow = FALSE))
# [,1] [,2]
# [1,] 2 5
# [2,] 4 3
# [3,] 1 6
out <- matrix(NA, max(nrow(m), nrow(m2)), max(ncol(m), ncol(m2)))
mrow <- min(nrow(m), nrow(m2))
mcol <- min(ncol(m), ncol(m2))
mTrim <- m[1:mrow, 1:mcol]
m2Trim <- m2[1:mrow, 1:mcol]
out[1:mrow, 1:mcol][mTrim == m2Trim] <- paste(row(mTrim), col(mTrim), sep = "-")[mTrim == m2Trim]
out
# [,1] [,2] [,3]
# [1,] NA "1-2" NA
# [2,] NA "2-2" NA
# [3,] NA NA NA
This function gives the desired output, but works on the condition that dim() is equal between the two matrices.
In order to generalize this for non identical matrices, on solution would be to subset the bigger matrix first.
The key is which(mat1==mat2, arr.ind=T) to get row-col index:
which(m==m2, arr.ind=T)
row col
[1,] 2 1
Inside a function:
find_in_matr <- function(mat1, mat2) {
if (!all(dim(mat1) == dim(mat2))) {
stop("mat1 and mat2 need to have the same dim()!")
}
m <- mat1
m[] <- NA # copy mat1 dim, and empty values
loc <- which(mat1==mat2, arr.ind=T) # find positions (both indxs)
m[loc] <- mapply(paste, sep="-", loc[, 1], loc[, 2]) # paste indxs
return(m)
}
Example:
set.seed(123)
m <- matrix(sample(4), 2, 2, byrow = T)
# m
# [,1] [,2]
# [1,] 2 3
# [2,] 1 4
m2 <- matrix(sample(4), 2, 2, byrow = F)
# m2
# [,1] [,2]
# [1,] 4 2
# [2,] 1 3
find_in_matr(m, m2)
# [,1] [,2]
# [1,] NA NA
# [2,] "2-1" NA
Silly piped version
library(magrittr)
(m == m2) %>%
`[<-`(!., NA) %>%
`[<-`((w <- which(., arr = T)), apply(w, 1, paste, collapse = '-'))
# [,1] [,2]
# [1,] NA NA
# [2,] "2-1" NA
I try to do it with ifelse() :
x <- apply(which(m == m2, arr.ind = T), 1, paste, collapse = "-")
ifelse(m != m2, NA, x)
# [,1] [,2]
# [1,] NA NA
# [2,] "2-1" NA
This method can deal with any dimensions.
e.g.
set.seed(999)
m1 <- matrix(sample(1:3, 12, replace = T), 3, 4)
m2 <- matrix(sample(1:3, 12, replace = T), 3, 4)
x <- apply(which(m1 == m2, arr.ind = T), 1, paste, collapse = "-")
ifelse(m1 != m2, NA, x)
# [,1] [,2] [,3] [,4]
# [1,] NA "1-4" NA "3-4"
# [2,] NA NA "2-3" NA
# [3,] "2-3" NA NA "1-2"

How to select sub-arrays from an n-dimensional array n-agnostically?

Short version:
How does one programmatically select sub-arrays from an n-dimensional array when n is arbitrary?
(If the short version of this question is clear enough, feel free to skip the rest of this post.)
Suppose that A is an array such that dim(A) is the vector of positive integers (d1, d2, …, dn), with n > 2.
For example:
> d <- 5:2
> set.seed(0)
> A <- array(runif(prod(d)), dim = d)
Here the array A corresponds to the definition given earlier, with n = 4, and dk = 6 - k, for k &in; {1, 2, 3, 4}.
Then, if 1 &leq; i &leq; d1 and 1 &leq; j &leq; d2, the expression A[i, j … ] (where … is a placeholder for n - 2 commas) evaluates to an (n - 2)-dimensional array.
To continue the previous example, if we take i = 3 and j = 2, my notation A[i, j … ] would denote the (n - 2 = 2)-dimensional array shown below:
> A[3, 2, ,]
[,1] [,2]
[1,] 0.94467527 0.4785452
[2,] 0.01339033 0.7111212
[3,] 0.02333120 0.1293723
More generally, if
1 &leq; k1 < k2 < … < km &leq; n
and
1 &leq; ir &leq; dkr, ∀r &in; {1, … m}, then an expression of the general form
A[ … i1 … i2 … … im … ]
...(where the …'s are placeholders for sequences of indices ik and commas), evaluates to an (n - m)-dimensional array.
For example,
> d <- c(4, 2, 5, 4, 2, 7, 3)
> set.seed(1)
> A <- array(runif(prod(d)), dim = d)
> A[3, 1, 4, , 1, 6, ]
[,1] [,2] [,3]
[1,] 0.5320469 0.77282382 0.18034186
[2,] 0.6817434 0.08627063 0.77227529
[3,] 0.8572805 0.32337850 0.63322550
[4,] 0.6555618 0.20578391 0.01257377
Now, one can write out expressions like A[i, j … ] and
A[ … i1 … i2 … … im … ] in full (i.e. filling in all the … placeholders) only if one knows n.
Of course, when one is working interactively, one usually knows (or can easily find out) what n is, and can use this knowledge to decide how many commas to insert in, e.g., A[i, j … ]. This is not the case, however, when one is writing code to work with multi-dimensional arrays of any number of dimensions.
How would one express selections such as A[i, j … ] and A[ … i1 … i2 … … im … ] when one does not know n?
Perhaps this will work for you:
func <- function(ary, ..., drop = TRUE) {
d <- length(dim(ary))
dots <- list(...)
if (length(dots) > d) stop("incorrect number of dimensions")
rest <- rep(TRUE, d - length(dots))
do.call(`[`, c(list(ary), c(dots, rest, drop = drop)))
}
Using your data:
d <- rev(2:5)
set.seed(0)
A <- array(runif(prod(d)), dim = d)
You normally need to know how many commas to include for the correct dimensionality:
A[3,2]
# Error in A[3, 2] : incorrect number of dimensions
This function "fills in" the rest of it for you:
func(A, 3, 2)
# [,1] [,2]
# [1,] 0.94467527 0.4785452
# [2,] 0.01339033 0.7111212
# [3,] 0.02333120 0.1293723
func(A, 3)
# , , 1
# [,1] [,2] [,3]
# [1,] 0.3721239 0.21214252 0.6470602
# [2,] 0.9446753 0.01339033 0.0233312
# [3,] 0.1765568 0.59956583 0.8612095
# [4,] 0.7176185 0.79423986 0.3162717
# , , 2
# [,1] [,2] [,3]
# [1,] 0.2936034 0.71251468 0.3531973
# [2,] 0.4785452 0.71112122 0.1293723
# [3,] 0.8394404 0.05893438 0.7317925
# [4,] 0.8643395 0.45527445 0.7155661
It correctly handles all dimensions:
A[3,2,1,1]
# [1] 0.9446753
func(A, 3, 2, 1, 1)
# [1] 0.9446753
And errors similarly with too many dimensions:
A[3,2,1,1,1]
# Error in A[3, 2, 1, 1, 1] : incorrect number of dimensions
func(A, 3, 2, 1, 1, 1)
# Error in func(A, 3, 2, 1, 1, 1) (from #4) : incorrect number of dimensions
Edit: and the part that I missed. In order to catch blanks, we need to have a little fun.
func <- function(ary, ..., drop = TRUE) {
d <- length(dim(ary))
dots <- as.list(match.call()[-(1:2)])
if (length(dots) > d) stop("incorrect number of dimensions")
pf <- parent.frame()
dots <- lapply(seq_along(dots), function(i) {
x <- dots[[i]]
if (missing(x)) TRUE else eval(dots[[i]], env = pf)
})
rest <- rep(TRUE, d - length(dots))
do.call(`[`, c(list(ary), c(dots, rest, drop = drop)))
}
I had a simpler version of this function (without the lappy), but it tended to fail if any of the positional arguments were variables vice literals.
d <- c(4, 2, 5, 4, 2, 7, 3)
set.seed(1)
A <- array(runif(prod(d)), dim = d)
A[3, 1, 4, , 1, 6, ]
# [,1] [,2] [,3]
# [1,] 0.007668596 0.1818094 0.3278203
# [2,] 0.286473525 0.4119333 0.4825088
# [3,] 0.008869468 0.4767760 0.7649491
# [4,] 0.330141563 0.3438217 0.8710419
func(A, 3, 1, 4, , 1, 6)
# [,1] [,2] [,3]
# [1,] 0.007668596 0.1818094 0.3278203
# [2,] 0.286473525 0.4119333 0.4825088
# [3,] 0.008869468 0.4767760 0.7649491
# [4,] 0.330141563 0.3438217 0.8710419
i <- 3
func(A, i, 1, 2+2, , 1, 6)
# [,1] [,2] [,3]
# [1,] 0.007668596 0.1818094 0.3278203
# [2,] 0.286473525 0.4119333 0.4825088
# [3,] 0.008869468 0.4767760 0.7649491
# [4,] 0.330141563 0.3438217 0.8710419

Output converted from matrix to vector in apply

I want to apply a function over one margin (column in my example) of a matrix. The problem is that the function returns matrix and apply converts it to vector so that it returns a matrix. My goal is to get three-dimensional array. Here is the example (note that matrix() is not the function of interest, just an example):
x <- matrix(1:12, 4, 3)
apply(x, 2, matrix, nrow = 2, ncol = 2)
The output is exactly the same as the input. I have pretty dull solution to this:
library(abind)
abind2 <- function (x, ...)
abind(x, ..., along = dim(x) + 1)
apply(x, 2, list) %>%
lapply(unlist) %>%
lapply(matrix, nrow = 2, ncol = 2) %>%
do.call(what = 'abind2')
I believe there must exist something better than this. Something that does not include list()ing and unlist()ing columns.
Edit:
Also, the solution should be ready to be easily applicable to any-dimensional array with any choice of MARGIN which my solution is not.
This, for example, I want to return 4-dimensional array.
x <- array(1:24, c(4,3,2))
apply(x, 2:3, list) %>%
lapply(unlist) %>%
lapply(matrix, nrow = 2, ncol = 2) %>%
do.call(what = 'abind2')
Not that complicated at all. Simply use
array(x, dim = c(2, 2, ncol(x)))
Matrix and general arrays are stored by column into a 1D long array in physical address. You can just reallocate dimension.
OK, here is possibly what you want to do in general:
tapply(x, col(x), FUN = matrix, nrow = 2, ncol = 2)
#$`1`
# [,1] [,2]
#[1,] 1 3
#[2,] 2 4
#
#$`2`
# [,1] [,2]
#[1,] 5 7
#[2,] 6 8
#
#$`3`
# [,1] [,2]
#[1,] 9 11
#[2,] 10 12
You can try to convert your matrix into a data.frame and use lapply to apply your function on the columns (as a data.frame is a list), it will return a list, where each element represents the function result for a column:
lapply(as.data.frame(x), matrix, nrow = 2, ncol = 2)
# $V1
# [,1] [,2]
# [1,] 1 3
# [2,] 2 4
# $V2
# [,1] [,2]
# [1,] 5 7
# [2,] 6 8
# $V3
# [,1] [,2]
# [1,] 9 11
# [2,] 10 12
EDIT with the second definition of x:
x <- array(1:24, c(4,3,2))
lapply(as.data.frame(x), matrix, nrow = 2, ncol = 2)
# $V1
# [,1] [,2]
# [1,] 1 3
# [2,] 2 4
# $V2
# [,1] [,2]
# [1,] 5 7
# [2,] 6 8
# $V3
# [,1] [,2]
# [1,] 9 11
# [2,] 10 12
# $V4
# [,1] [,2]
# [1,] 13 15
# [2,] 14 16
# $V5
# [,1] [,2]
# [1,] 17 19
# [2,] 18 20
# $V6
# [,1] [,2]
# [1,] 21 23
# [2,] 22 24
EDIT2: a try to get an arry as result
Based on this similar question, you may try this code:
x <- array(1:24, c(4,3,2))
sapply(1:3,
function(y) sapply(1:ncol(x[, y, ]),
function(z) matrix(x[,y,z], ncol=2, nrow=2),
simplify="array"),
simplify="array")
Dimension of the result is 2 2 2 3.
Actually, the problem here is that it needs two different calls to apply when x is an array of more than 2 dimension. In the last example of the quesion (with x <- array(1:24, c(4,3,2))), we want to apply to each element of third dimension a function that apply to each element of second dimension the matrix function.

Resources