Related
I want to run mantelhaen.test in R which requires 2x2 contingency tables in 3D array form. These tables can be constructed by looping over each row of the dataframe, but I am trying to figure out if there is a vectorised way to do it - ie using apply(df[,c("col1","col2",etc), margin=1, array(x, c(2,2,11))) to make a 3D array for each row of the table (which would then be wrapped in mantelhaen.test).
I have previously got this to work using matrix() for fisher.test in R and Pandas, but in this case I am running into an issue where array() doesn't seem to have any effect on the data. Here is a reproducible example:
df = data.frame(group1_variant_cases = c(2,1,3,0,0,2), group1_nonvariant_cases = c(100,92,33,40,21,87),
group1_variant_controls = c(1,2,1,0,2,1), group1_nonvariant_controls = c(45,61,70,71,31,55),
group2_variant_cases = c(0,2,1,0,1,0), group2_nonvariant_cases = c(201,99,213,52,178,98),
group2_variant_controls = c(1,0,0,0,1,2), group2_nonvariant_controls = c(67,43,12,88,91,73))
apply(head(df,1), 1, function(x) array(x, c(2,2,2)))
Output:
1
[1,] 2
[2,] 100
[3,] 1
[4,] 45
[5,] 0
[6,] 201
[7,] 1
[8,] 67
Any help appreciated!
With apply, there is simplify argument which is by default TRUE. Change it to FALSE and it works i.e. according to ?apply
If each call to FUN returns a vector of length n, and simplify is TRUE, then apply returns an array of dimension c(n, dim(X)[MARGIN]) if n > 1. If n equals 1, apply returns a vector if MARGIN has length 1 and an array of dimension dim(X)[MARGIN] otherwise.
apply(head(df,3), 1, function(x) array(x, c(2,2,2)), simplify = FALSE)
-output
$`1`
, , 1
[,1] [,2]
[1,] 2 1
[2,] 100 45
, , 2
[,1] [,2]
[1,] 0 1
[2,] 201 67
$`2`
, , 1
[,1] [,2]
[1,] 1 2
[2,] 92 61
, , 2
[,1] [,2]
[1,] 2 0
[2,] 99 43
$`3`
, , 1
[,1] [,2]
[1,] 3 1
[2,] 33 70
, , 2
[,1] [,2]
[1,] 1 0
[2,] 213 12
I have a list containing matrices of the same size in R. I would like to apply a function over the same element of all matrices. Example:
> a <- matrix(1:4, ncol = 2)
> b <- matrix(5:8, ncol = 2)
> c <- list(a,b)
> c
[[1]]
[,1] [,2]
[1,] 1 3
[2,] 2 4
[[2]]
[,1] [,2]
[1,] 5 7
[2,] 6 8
Now I want to apply the mean function and would like to get a matrix like that:
[,1] [,2]
[1,] 3 5
[2,] 4 6
One conceptual way to do this would be to sum up the matrices and then take the average value of each entry. Try using Reduce:
Reduce('+', c) / length(c)
Output:
[,1] [,2]
[1,] 3 5
[2,] 4 6
Demo here:
Rextester
Another option is to construct an array and then use apply.
step 1: constructing the array.
Using the abind library and do.call, you can do this:
library(abind)
myArray <- do.call(function(...) abind(..., along=3), c)
Using base R, you can strip out the structure and then rebuild it like this:
myArray <- array(unlist(c), dim=c(dim(a), length(c)))
In both instances, these return the desired array
, , 1
[,1] [,2]
[1,] 1 3
[2,] 2 4
, , 2
[,1] [,2]
[1,] 5 7
[2,] 6 8
step 2: use apply to calculate the mean along the first and second dimensions.
apply(myArray, 1:2, mean)
[,1] [,2]
[1,] 3 5
[2,] 4 6
This will be more flexible than Reduce, since you can swap out many more functions, but it will be slower for this particular application.
If i have a n dimensional array it can be sliced by a m * n matrix like this
a <- array(1:27,c(3,3,3))
b <- matrix(rep(1:3,3),3)
# This will return the index a[1,1,1] a[2,2,2] and a[3,3,3]
a[b]
# Output
[1] 1 14 27
Is there any "effective and easy" way to do a similar slice but to keep some dimensions free?
That is slice a n dimensional array with a m * (n-i) dimensional array and
get a i+1 dimensional array as result.
a <- array(1:27,c(3,3,3))
b <- matrix(rep(1:2,2),2)
# This will return a vector of the index a[1] a[2] a[1] and a[2]
a[b]
# Output
[1] 1 2 1 2
# This will return the indexes of the cartesian product between the vectors,
# that is a array consisting of a[1,,1] a[1,,2] a[2,,1] and a[2,,2]
a[c(1,2),,c(1,2)]
# Output
, , 1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
, , 2
[,1] [,2] [,3]
[1,] 10 13 16
[2,] 11 14 17
The desired result should be if the last command returned an array
with a[1,,1] and a[2,,2].
For now I solve this the problem with a for loop and abind but I'm sure there must be a better way.
# Desired functionality
a <- array(1:27,c(3,3,3))
b <- array(c(c(1,2),c(1,2)),c(2,2))
sliceem(a,b,freeDimension=2)
# Desired output (In this case rbind(a[1,,1],a[2,,2]) )
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 11 14 17
I think this is the cleanest way -- making a separate function:
slicem <- function(a,idx,drop=FALSE) do.call(`[`,c(list(a),idx,list(drop=drop)))
# usage for OP's example
a <- array(1:27, c(3,3,3))
idx <- list(1:2, TRUE, 1:2)
slicem(a,idx)
which gives
, , 1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
, , 2
[,1] [,2] [,3]
[1,] 10 13 16
[2,] 11 14 17
You have to write TRUE for each dimension that you aren't selecting from.
Following the OP's new expectations...
library(abind)
nistfun <- function(a,list_o_idx,drop=FALSE){
lens <- lengths(list_o_idx)
do.call(abind, lapply(seq.int(max(lens)), function(i)
slicem(a, mapply(`[`, list_o_idx, pmin(lens,i), SIMPLIFY=FALSE), drop=drop)
))
}
# usage for OP's new example
nistfun(a, idx)
# , , 1
#
# [,1] [,2] [,3]
# [1,] 1 4 7
#
# , , 2
#
# [,1] [,2] [,3]
# [1,] 11 14 17
Now, any non-TRUE indices must have the same length, since they will be matched up.
abind is used here instead of rbind (see an earlier edit on this answer) because it is the only sensible general way to think about slicing up an array. If you really want to drop dimensions, it's quite ambiguous which should be dropped and how, so the vector alone is returned:
nistfun(a, idx, drop=TRUE)
# [1] 1 4 7 11 14 17
If you want to throw this back into an array of some sort, you can do that after the fact:
matrix( nistfun(a, idx), max(lengths(idx)), dim(a)[sapply(idx,isTRUE)]), byrow=TRUE)
# [,1] [,2] [,3]
# [1,] 1 4 7
# [2,] 11 14 17
I was wondering about what are the ways to construct dynamic-size array in R.
For one example, I want to construct a n-vector but its dimension n is dynamically determined. The following code will work:
> x=NULL
> n=2;
> for (i in 1:n) x[i]=i;
> x
[1] 1 2
For another example, I want to construct a n by 2 matrix where the number of rows n is dynamically determined. But I fail even at assigning the first row:
> tmp=c(1,2)
> x=NULL
> x[1,]=tmp
Error in x[1, ] = tmp : incorrect number of subscripts on matrix
> x[1,:]=tmp
Error: unexpected ':' in "x[1,:"
Thanks and regards!
I think the answers you are looking for are rbind() and cbind():
> x=NULL # could also use x <- c()
> rbind(x, c(1,2))
[,1] [,2]
[1,] 1 2
> x <- rbind(x, c(1,2))
> x <- rbind(x, c(1,2)) # now extend row-wise
> x
[,1] [,2]
[1,] 1 2
[2,] 1 2
> x <- cbind(x, c(1,2)) # or column-wise
> x
[,1] [,2] [,3]
[1,] 1 2 1
[2,] 1 2 2
The strategy of trying to assign to "new indices" on the fly as you attempted can be done in some languages but cannot be done that way in R.
You can also use sparse matrices provided in the Matrix package. They would allow assignments of the form M <- sparseMatrix(i=200, j=50, x=234) resulting in a single value at row 200, column 50 and 0's everywhere else.
require(Matrix)
M <- sparseMatrix(i=200, j=50, x=234)
M[1,1]
# [1] 0
M[200, 50]
# [1] 234
But I think the use of sparse matrices is best reserved for later use after mastering regular matrices.
It is possible to dimension the array after we fill it (in a one-dimensional, vector, fashion)
Emulating the 1-dimension snippet of the question, here's the way it can be done with higher dimensions.
> x=c()
> tmp=c(1,2)
> n=6
> for (i in seq(1, by=2, length=n)) x[i:(i+1)] =tmp;
> dim(x) = c(2,n)
> x
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 1 1 1 1 1
[2,] 2 2 2 2 2 2
>
Rather than using i:(i+1) as index, it may be preferable to use seq(i, length=2) or better yet, seq(i, length=length(tmp)) for a more generic approach, as illustrated below (for a 4 x 7 array example)
> x=c()
> tmp=c(1,2,3,4)
> n=7
> for (i in seq(1, by=length(tmp), length=n))
x[seq(i, length=length(tmp))] = tmp;
> dim(x) = c(length(tmp),n)
> x
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 1 1 1 1 1 1
[2,] 2 2 2 2 2 2 2
[3,] 3 3 3 3 3 3 3
[4,] 4 4 4 4 4 4 4
>
We can also obtain a similar result by re-assigning x with cbind/rbind, as follow.
> tmp=c(1,2)
> n=6
> x=rbind(tmp)
> for (i in 1:n) x=rbind(x, tmp);
> x
[,1] [,2]
tmp 1 2
tmp 1 2
tmp 1 2
tmp 1 2
tmp 1 2
tmp 1 2
tmp 1 2
Note: one can get rid of the "tmp" names (these are a side effect of the rbind), with
> dimnames(x)=NULL
You can rbind it:
tmp = c(1,2)
x = NULL
rbind(x, tmp)
I believe this is an approach you need
arr <- array(1)
arr <- append(arr,3)
arr[1] <- 2
print(arr[1])
(found on rosettacode.org)
When I want to dynamically construct an array (matrix), I do it like so:
n <- 500
new.mtrx <- matrix(ncol = 2, nrow = n)
head(new.mtrx)
[,1] [,2]
[1,] NA NA
[2,] NA NA
[3,] NA NA
[4,] NA NA
[5,] NA NA
[6,] NA NA
Your matrix is now ready to accept vectors.
Assuming you already have a vector, you pass that to the matrix() function. Notice how values are "broken" into the matrix (column wise). This can be changed with byrow argument.
matrix(letters, ncol = 2)
[,1] [,2]
[1,] "a" "n"
[2,] "b" "o"
[3,] "c" "p"
[4,] "d" "q"
[5,] "e" "r"
[6,] "f" "s"
[7,] "g" "t"
[8,] "h" "u"
[9,] "i" "v"
[10,] "j" "w"
[11,] "k" "x"
[12,] "l" "y"
[13,] "m" "z"
n = 5
x = c(1,2) %o% rep(1,n)
x
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 1 1 1 1
# [2,] 2 2 2 2 2
x = rep(1,n) %o% c(1,2)
x
# [,1] [,2]
# [1,] 1 2
# [2,] 1 2
# [3,] 1 2
# [4,] 1 2
# [5,] 1 2
I have X, a three-dimensional array in R. I want to take a vector of indices indx (length equal to dim(X)[1]) and form a matrix where the first row is the first row of X[ , , indx[1]], the second row is the second row of X[ , , indx[2]], and so on.
For example, I have:
R> X <- array(1:18, dim = c(3, 2, 3))
R> X
, , 1
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
, , 2
[,1] [,2]
[1,] 7 10
[2,] 8 11
[3,] 9 12
, , 3
[,1] [,2]
[1,] 13 16
[2,] 14 17
[3,] 15 18
R> indx <- c(2, 3, 1)
My desired output is
R> rbind(X[1, , 2], X[2, , 3], X[3, , 1])
[,1] [,2]
[1,] 7 10
[2,] 14 17
[3,] 3 6
As of now I'm using the inelegant (and slow) sapply(1:dim(X)[2], function(x) X[cbind(1:3, x, indx)]). Is there any way to do this using the built-in indexing functions? I had no luck experimenting with the matrix indexing methods described in ?Extract, but I may just be doing it wrong.
Maybe like this:
t(sapply(1:3,function(x) X[,,idx][x,,x]))
I may be answering the wrong question (I can't reconcile your first description and your sample output)... This produces your sample output, but I can't say that it's much faster without running it on your data.
do.call(rbind, lapply(1:dim(X)[1], function(i) X[i, , indx[i]]))
Matrix indexing to the rescue! No applys needed.
Figure out which indices you want:
n <- dim(X)[2]
foo <- cbind(rep(seq_along(indx),n),
rep(seq.int(n), each=length(indx)),
rep(indx,n))
(the result is this)
[,1] [,2] [,3]
[1,] 1 1 2
[2,] 2 1 3
[3,] 3 1 1
[4,] 1 2 2
[5,] 2 2 3
[6,] 3 2 1
and use it as index, converting back to a matrix to make it look like your output.
> matrix(X[foo],ncol=n)
[,1] [,2]
[1,] 7 10
[2,] 14 17
[3,] 3 6