Form matrix from rows in 3-dimensional array - r

I have X, a three-dimensional array in R. I want to take a vector of indices indx (length equal to dim(X)[1]) and form a matrix where the first row is the first row of X[ , , indx[1]], the second row is the second row of X[ , , indx[2]], and so on.
For example, I have:
R> X <- array(1:18, dim = c(3, 2, 3))
R> X
, , 1
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
, , 2
[,1] [,2]
[1,] 7 10
[2,] 8 11
[3,] 9 12
, , 3
[,1] [,2]
[1,] 13 16
[2,] 14 17
[3,] 15 18
R> indx <- c(2, 3, 1)
My desired output is
R> rbind(X[1, , 2], X[2, , 3], X[3, , 1])
[,1] [,2]
[1,] 7 10
[2,] 14 17
[3,] 3 6
As of now I'm using the inelegant (and slow) sapply(1:dim(X)[2], function(x) X[cbind(1:3, x, indx)]). Is there any way to do this using the built-in indexing functions? I had no luck experimenting with the matrix indexing methods described in ?Extract, but I may just be doing it wrong.

Maybe like this:
t(sapply(1:3,function(x) X[,,idx][x,,x]))

I may be answering the wrong question (I can't reconcile your first description and your sample output)... This produces your sample output, but I can't say that it's much faster without running it on your data.
do.call(rbind, lapply(1:dim(X)[1], function(i) X[i, , indx[i]]))

Matrix indexing to the rescue! No applys needed.
Figure out which indices you want:
n <- dim(X)[2]
foo <- cbind(rep(seq_along(indx),n),
rep(seq.int(n), each=length(indx)),
rep(indx,n))
(the result is this)
[,1] [,2] [,3]
[1,] 1 1 2
[2,] 2 1 3
[3,] 3 1 1
[4,] 1 2 2
[5,] 2 2 3
[6,] 3 2 1
and use it as index, converting back to a matrix to make it look like your output.
> matrix(X[foo],ncol=n)
[,1] [,2]
[1,] 7 10
[2,] 14 17
[3,] 3 6

Related

apply function to subsets of each row in R

I am struggling to find a way to apply a specific function using apply, only to a "chunk" of a specific row.
For instance, I have a matrix:
x <- matrix(c(5,12,4,3,2,8,10,7,9,1,11,6),nrow=3)
[,1] [,2] [,3] [,4]
[1,] 5 3 10 1
[2,] 12 2 7 11
[3,] 4 8 9 6
And I would like to end up with a new matrix, made up of a sum of the first and last two values in each row. Like so:
[,1] [,2]
[1,] 8 11
[2,] 14 18
[3,] 12 15
I have tried something like this:
chunks<-c("1:2","3:4")
sumchunks<-function(x,chunks){
apply(x,1,
function(row){
for (i in chunks){
v<-sum(row[chunks[i]])
}})
}
But it doesn't work at all. Any suggestion on successful ways?
Thank you.
You can do:
chunks <- list(1:2, 3:4)
sumchunks <- function(x, chunks) sapply(chunks, function(ch) sum(x[ch]))
x <- matrix(c(5,12,4,3,2,8,10,7,9,1,11,6),nrow=3)
apply(x, 1, sumchunks, chunks=chunks)
# [,1] [,2] [,3]
# [1,] 8 14 12
# [2,] 11 18 15
Eventually you want to transpose the result.
Here is a vectorized variant:
chunks <- list(1:2, 3:4)
x <- matrix(c(5,12,4,3,2,8,10,7,9,1,11,6),nrow=3)
sapply(chunks, function(ch) rowSums(x[,ch]))
# [,1] [,2]
# [1,] 8 11
# [2,] 14 18
# [3,] 12 15
We can convert to array and then do
t(apply(array(x, c(3, 2, 2)), 1, colSums))
Or
sapply(seq(1, ncol(x), 2), function(i) rowSums(x[,i:(i+1)]))
# [,1] [,2]
#[1,] 8 11
#[2,] 14 18
#[3,] 12 15
like this?
x <- matrix(sample(1:12),nrow=3)
f = function(s) {
c(sum(s[1:2]), sum(s[3:4]))
}
t(apply(x, 1, f))
rowSums was built to sum over rows so should be quite fast. You can limit the columns you want to sum over and then cbind them to get what you want:
cbind(rowSums(x[,c(1,2)]), rowSums(x[,c(3,4)]))
# [,1] [,2]
#[1,] 8 11
#[2,] 14 18
#[3,] 12 15

How to slice a n dimensional array with a m*(n-i) dimensional matrix?

If i have a n dimensional array it can be sliced by a m * n matrix like this
a <- array(1:27,c(3,3,3))
b <- matrix(rep(1:3,3),3)
# This will return the index a[1,1,1] a[2,2,2] and a[3,3,3]
a[b]
# Output
[1] 1 14 27
Is there any "effective and easy" way to do a similar slice but to keep some dimensions free?
That is slice a n dimensional array with a m * (n-i) dimensional array and
get a i+1 dimensional array as result.
a <- array(1:27,c(3,3,3))
b <- matrix(rep(1:2,2),2)
# This will return a vector of the index a[1] a[2] a[1] and a[2]
a[b]
# Output
[1] 1 2 1 2
# This will return the indexes of the cartesian product between the vectors,
# that is a array consisting of a[1,,1] a[1,,2] a[2,,1] and a[2,,2]
a[c(1,2),,c(1,2)]
# Output
, , 1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
, , 2
[,1] [,2] [,3]
[1,] 10 13 16
[2,] 11 14 17
The desired result should be if the last command returned an array
with a[1,,1] and a[2,,2].
For now I solve this the problem with a for loop and abind but I'm sure there must be a better way.
# Desired functionality
a <- array(1:27,c(3,3,3))
b <- array(c(c(1,2),c(1,2)),c(2,2))
sliceem(a,b,freeDimension=2)
# Desired output (In this case rbind(a[1,,1],a[2,,2]) )
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 11 14 17
I think this is the cleanest way -- making a separate function:
slicem <- function(a,idx,drop=FALSE) do.call(`[`,c(list(a),idx,list(drop=drop)))
# usage for OP's example
a <- array(1:27, c(3,3,3))
idx <- list(1:2, TRUE, 1:2)
slicem(a,idx)
which gives
, , 1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
, , 2
[,1] [,2] [,3]
[1,] 10 13 16
[2,] 11 14 17
You have to write TRUE for each dimension that you aren't selecting from.
Following the OP's new expectations...
library(abind)
nistfun <- function(a,list_o_idx,drop=FALSE){
lens <- lengths(list_o_idx)
do.call(abind, lapply(seq.int(max(lens)), function(i)
slicem(a, mapply(`[`, list_o_idx, pmin(lens,i), SIMPLIFY=FALSE), drop=drop)
))
}
# usage for OP's new example
nistfun(a, idx)
# , , 1
#
# [,1] [,2] [,3]
# [1,] 1 4 7
#
# , , 2
#
# [,1] [,2] [,3]
# [1,] 11 14 17
Now, any non-TRUE indices must have the same length, since they will be matched up.
abind is used here instead of rbind (see an earlier edit on this answer) because it is the only sensible general way to think about slicing up an array. If you really want to drop dimensions, it's quite ambiguous which should be dropped and how, so the vector alone is returned:
nistfun(a, idx, drop=TRUE)
# [1] 1 4 7 11 14 17
If you want to throw this back into an array of some sort, you can do that after the fact:
matrix( nistfun(a, idx), max(lengths(idx)), dim(a)[sapply(idx,isTRUE)]), byrow=TRUE)
# [,1] [,2] [,3]
# [1,] 1 4 7
# [2,] 11 14 17

R: subsetting N-dimensional arrays

Consider the following 3-dimensional array:
set.seed(123)
arr = array(sample(c(1:10)), dim=c(3,4,2))
which yields
> arr
, , 1
[,1] [,2] [,3] [,4]
[1,] 10 9 8 2
[2,] 5 1 4 10
[3,] 6 7 3 5
, , 2
[,1] [,2] [,3] [,4]
[1,] 6 7 3 5
[2,] 9 8 2 6
[3,] 1 4 10 9
I'd like to subset it like
arr[c(1,2), c(2,4), c(1)]
but the catch is that I don't know (a) which indices or (b) which dimension the indices are.
What is the best way to access an N-dimensional array with index variables?
ll = list(c(1,2), c(2,4), c(1))
arr[ll] # doesn't work
arr[grid.expand(ll)] # doesn't work
# ..what else?
use do.call, such as:
do.call(`[`, c(list(arr), ll))
or more cleanly, using a wrapper function:
getArr <- function(...)
`[`(arr, ...)
do.call(getArr, ll)
[,1] [,2]
[1,] 10 5
[2,] 7 3
There is the asub function from the abind package:
library(abind)
asub(arr, ll)
which can also do a lot more, in particular extract along a subset of the dimensions (https://stackoverflow.com/a/17752012/1201032). Worth having in your toolbox.

Build a square-ish matrix with a specified number of cells

I would like to write a function that transforms an integer, n, (specifying the number of cells in a matrix) into a square-ish matrix that contain the sequence 1:n. The goal is to make the matrix as "square" as possible.
This involves a couple of considerations:
How to maximize "square"-ness? I was thinking of a penalty equal to the difference in the dimensions of the matrix, e.g. penalty <- abs(dim(mat)[1]-dim(mat)[2]), such that penalty==0 when the matrix is square and is positive otherwise. Ideally this would then, e.g., for n==12 lead to a preference for a 3x4 rather than 2x6 matrix. But I'm not sure the best way to do this.
Account for odd-numbered values of n. Odd-numbered values of n do not necessarily produce an obvious choice of matrix (unless they have an integer square root, like n==9. I thought about simply adding 1 to n, and then handling as an even number and allowing for one blank cell, but I'm not sure if this is the best approach. I imagine it might be possible to obtain a more square matrix (by the definition in 1) by adding more than 1 to n.
Allow the function to trade-off squareness (as described in #1) and the number of blank cells (as described in #2), so the function should have some kind of parameter(s) to address this trade-off. For example, for n==11, a 3x4 matrix is pretty square but not as square as a 4x4, but the 4x4 would have many more blank cells than the 3x4.
The function needs to optionally produce wider or taller matrices, so that n==12 can produce either a 3x4 or a 4x3 matrix. But this would be easy to handle with a t() of the resulting matrix.
Here's some intended output:
> makemat(2)
[,1]
[1,] 1
[2,] 2
> makemat(3)
[,1] [,2]
[1,] 1 3
[2,] 2 4
> makemat(9)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> makemat(11)
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
Here's basically a really terrible start to this problem.
makemat <- function(n) {
n <- abs(as.integer(n))
d <- seq_len(n)
out <- d[n %% d == 0]
if(length(out)<2)
stop('n has fewer than two factors')
dim1a <- out[length(out)-1]
m <- matrix(1:n, ncol=dim1a)
m
}
As you'll see I haven't really been able to account for odd-numbered values of n (look at the output of makemat(7) or makemat(11) as described in #2, or enforce the "squareness" rule described in #1, or the trade-off between them as described in #3.
I think the logic you want is already in the utility function n2mfrow(), which as its name suggests is for creating input to the mfrow graphical parameter and takes an integer input and returns the number of panels in rows and columns to split the display into:
> n2mfrow(11)
[1] 4 3
It favours tall layouts over wide ones, but that is easily fixed via rev() on the output or t() on a matrix produced from the results of n2mfrow().
makemat <- function(n, wide = FALSE) {
if(isTRUE(all.equal(n, 3))) {
dims <- c(2,2)
} else {
dims <- n2mfrow(n)
}
if(wide)
dims <- rev(dims)
m <- matrix(seq_len(prod(dims)), nrow = dims[1], ncol = dims[2])
m
}
Notice I have to special-case n = 3 as we are abusing a function intended for another use and a 3x1 layout on a plot makes more sense than a 2x2 with an empty space.
In use we have:
> makemat(2)
[,1]
[1,] 1
[2,] 2
> makemat(3)
[,1] [,2]
[1,] 1 3
[2,] 2 4
> makemat(9)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> makemat(11)
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
> makemat(11, wide = TRUE)
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
Edit:
The original function padded seq_len(n) with NA, but I realised the OP wanted to have a sequence from 1 to prod(nrows, ncols), which is what the version above does. The one below pads with NA.
makemat <- function(n, wide = FALSE) {
if(isTRUE(all.equal(n, 3))) {
dims <- c(2,2)
} else {
dims <- n2mfrow(n)
}
if(wide)
dims <- rev(dims)
s <- rep(NA, prod(dims))
ind <- seq_len(n)
s[ind] <- ind
m <- matrix(s, nrow = dims[1], ncol = dims[2])
m
}
I think this function implicitly satisfies your constraints. The parameter can range from 0 to Inf. The function always returns either a square matrix with sides of ceiling(sqrt(n)), or a (maybe) rectangular matrix with rows floor(sqrt(n)) and just enough columns to "fill it out". The parameter trades off the selection between the two: if it is less than 1, then the second, more rectangular matrices are preferred, and if greater than 1, the first, always square matrices are preferred. A param of 1 weights them equally.
makemat<-function(n,param=1,wide=TRUE){
if (n<1) stop('n must be positive')
s<-sqrt(n)
bottom<-n-(floor(s)^2)
top<-(ceiling(s)^2)-n
if((bottom*param)<top) {
rows<-floor(s)
cols<-rows + ceiling(bottom / rows)
} else {
cols<-rows<-ceiling(s)
}
if(!wide) {
hold<-rows
rows<-cols
cols<-hold
}
m<-seq.int(rows*cols)
dim(m)<-c(rows,cols)
m
}
Here is an example where the parameter is set to default, and equally trades off the distance equally:
lapply(c(2,3,9,11),makemat)
# [[1]]
# [,1] [,2]
# [1,] 1 2
#
# [[2]]
# [,1] [,2]
# [1,] 1 3
# [2,] 2 4
#
# [[3]]
# [,1] [,2] [,3]
# [1,] 1 4 7
# [2,] 2 5 8
# [3,] 3 6 9
#
# [[4]]
# [,1] [,2] [,3] [,4]
# [1,] 1 4 7 10
# [2,] 2 5 8 11
# [3,] 3 6 9 12
Here is an example of using the param with 11, to get a 4x4 matrix.
makemat(11,3)
# [,1] [,2] [,3] [,4]
# [1,] 1 5 9 13
# [2,] 2 6 10 14
# [3,] 3 7 11 15
# [4,] 4 8 12 16
What about something fairly simple and you can handle the exceptions and other requests in a wrapper?
library(taRifx)
neven <- 8
nodd <- 11
nsquareodd <- 9
nsquareeven <- 16
makemat <- function(n) {
s <- seq(n)
if( odd(n) ) {
s[ length(s)+1 ] <- NA
n <- n+1
}
sq <- sqrt( n )
dimx <- ceiling( sq )
dimy <- floor( sq )
if( dimx*dimy < length(s) ) dimy <- ceiling( sq )
l <- dimx*dimy
ldiff <- l - length(s)
stopifnot( ldiff >= 0 )
if( ldiff > 0 ) s[ seq( length(s) + 1, length(s) + ldiff ) ] <- NA
matrix( s, nrow = dimx, ncol = dimy )
}
> makemat(neven)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 NA
> makemat(nodd)
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 NA
> makemat(nsquareodd)
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 NA
[3,] 3 7 NA
[4,] 4 8 NA
> makemat(nsquareeven)
[,1] [,2] [,3] [,4]
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16

Data structure to hold multiple matrices

I have an array of strings which are actually names of datasets. I perform several measures on each dataset and get result of each measure in a matrix.
I want to save the results of one dataset in some data structure.
So, for example:
We have a string "glass".
From measurements on dataset "glass" I get 3 matrices a,b,c.
How could I save a,b,c in one structure?
Thanks.
Use a list.
> mydata <- list()
> mydata[[1]] <- matrix(1:4, 2, 2)
> mydata[[2]] <- matrix(1:10, 5, 2)
> mydata[[3]] <- matrix(1:16, 4, 4)
> mydata
[[1]]
[,1] [,2]
[1,] 1 3
[2,] 2 4
[[2]]
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
[[3]]
[,1] [,2] [,3] [,4]
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16
>
> # To access the first matrix in the list...
> mydata[[1]]
[,1] [,2]
[1,] 1 3
[2,] 2 4
See ?list for more information.
Since they are the same size you can choose either list or a array. Dason showed the list option.
a=matrix(rnorm(16),nrow=4)
b=matrix(rnorm(16),nrow=4)
d=matrix(rnorm(16),nrow=4)
glass=array(c(a,b,d),dim=c(4,4,3))

Resources