Unexpected apply function behaviour in R - r

I've discovered a surprising behaviour by apply that I wonder if anyone can explain. Lets take a simple matrix:
> (m = matrix(1:8,ncol=4))
[,1] [,2] [,3] [,4]
[1,] 1 3 5 7
[2,] 2 4 6 8
We can flip it vertically thus:
> apply(m, MARGIN=2, rev)
[,1] [,2] [,3] [,4]
[1,] 2 4 6 8
[2,] 1 3 5 7
This applies the rev() vector reversal function iteratively to each column. But when we try to apply rev by row we get:
> apply(m, MARGIN=1, rev)
[,1] [,2]
[1,] 7 8
[2,] 5 6
[3,] 3 4
[4,] 1 2
.. a 90 degree anti-clockwise rotation! Apply delivers the same result using FUN=function(v) {v[length(v):1]} so it is definitely not rev's fault.
Any explanation for this?

This is because apply returns a matrix that is defined column-wise, and you're iterating over the rows.
The first application of apply presents each row, which is then a column in the result.
Presenting the function print shows what's being passed to rev at each iteration:
x <- apply(m, 1, print)
[1] 1 3 5 7
[1] 2 4 6 8
That is, each call to print is passed a vector. Two calls, and c(1,3,5,7) and c(2,4,6,8) are being passed to the function.
Reversing these gives c(7,5,3,1) and c(8,6,4,2), then these are used as the columns of the return matrix, giving the result that you see.

The documentation states that
If each call to FUN returns a vector of length n, then apply returns
an array of dimension c(n, dim(X)[MARGIN]) if n > 1.
From that perspective, this behaviour is not a bug whatsoever, that's how it intended to work.
One may wonder why this is chosen to be a default setting, instead of preserving the structure of the original matrix. Consider the following example:
> apply(m, 1, quantile)
[,1] [,2]
0% 1.0 2.0
25% 2.5 3.5
50% 4.0 5.0
75% 5.5 6.5
100% 7.0 8.0
> apply(m, 2, quantile)
[,1] [,2] [,3] [,4]
0% 1.00 3.00 5.00 7.00
25% 1.25 3.25 5.25 7.25
50% 1.50 3.50 5.50 7.50
75% 1.75 3.75 5.75 7.75
100% 2.00 4.00 6.00 8.00
> all(rownames(apply(m, 2, quantile)) == rownames(apply(m, 1, quantile)))
[1] TRUE
Consistent? Indeed, why would we expect anything else?

When you pass a row vector to rev, it returns a column vector.
t(c(1,2,3,4))
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
rev(t(c(1,2,3,4)))
[1] 4 3 2 1
which is not what you expected
[,1] [,2] [,3] [,4]
[1,] 4 3 2 1
So, you'll have to transpose the call to apply to get what you want
t(apply(m, MARGIN=1, rev))
[,1] [,2] [,3] [,4]
[1,] 7 5 3 1
[2,] 8 6 4 2

Related

How to apply a function on every element of all elements in a list in R

I have a list containing matrices of the same size in R. I would like to apply a function over the same element of all matrices. Example:
> a <- matrix(1:4, ncol = 2)
> b <- matrix(5:8, ncol = 2)
> c <- list(a,b)
> c
[[1]]
[,1] [,2]
[1,] 1 3
[2,] 2 4
[[2]]
[,1] [,2]
[1,] 5 7
[2,] 6 8
Now I want to apply the mean function and would like to get a matrix like that:
[,1] [,2]
[1,] 3 5
[2,] 4 6
One conceptual way to do this would be to sum up the matrices and then take the average value of each entry. Try using Reduce:
Reduce('+', c) / length(c)
Output:
[,1] [,2]
[1,] 3 5
[2,] 4 6
Demo here:
Rextester
Another option is to construct an array and then use apply.
step 1: constructing the array.
Using the abind library and do.call, you can do this:
library(abind)
myArray <- do.call(function(...) abind(..., along=3), c)
Using base R, you can strip out the structure and then rebuild it like this:
myArray <- array(unlist(c), dim=c(dim(a), length(c)))
In both instances, these return the desired array
, , 1
[,1] [,2]
[1,] 1 3
[2,] 2 4
, , 2
[,1] [,2]
[1,] 5 7
[2,] 6 8
step 2: use apply to calculate the mean along the first and second dimensions.
apply(myArray, 1:2, mean)
[,1] [,2]
[1,] 3 5
[2,] 4 6
This will be more flexible than Reduce, since you can swap out many more functions, but it will be slower for this particular application.

apply and lapply in one function return an NAN

I have a function return list of list, I would like to find the standard deviation of the matrices of my output. The output of my function is a list of two list. I tried this code but it return me NAN. Since my function is complex, then I use this example from another question please see here since it is quite close to what I am trying to do.
> A <- matrix(c(1:9), 3, 3)
> A
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> B <- matrix(c(2:10), 3, 3)
> B
[,1] [,2] [,3]
[1,] 2 5 8
[2,] 3 6 9
[3,] 4 7 10
> my.list1 <- list(A, B)
so the mean of the first list is:
[,1] [,2] [,3]
[1,] 1.5 4.5 7.5
[2,] 2.5 5.5 8.5
[3,] 3.5 6.5 9.5
Then the standard deviation will be:
[,1] [,2] [,3]
[1,] 0.7071068 0.7071068 0.7071068
[2,] 0.7071068 0.7071068 0.7071068
[3,] 0.7071068 0.7071068 0.7071068
> c <- matrix(c(1:9), 3, 3)
> c
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> d <- matrix(c(2:10), 3, 3)
> d
[,1] [,2] [,3]
[1,] 2 5 8
[2,] 3 6 9
[3,] 4 7 10
> my.list2 <- list(c, d)
my.list <-list(my.list1,my.list2)
How can I get the standard deviation of my matrices on an element by element for the list?
Try ?rapply
> rapply(my.list, sd)
[1] 2.738613 2.738613 2.738613 2.738613
You could bind your lists into an array, or perhaps make your function return an array(?), then you could use apply() to apply your chosen functions...
A <- matrix(1:9, 3, 3)
B <- matrix(2:10, 3, 3)
my.list1 <- list(A, B)
c <- matrix(1:9, 3, 3)
d <- matrix(2:10, 3, 3)
my.list2 <- list(c, d)
Create array from all 4 lists
my.array1 <- abind::abind(c(my.list1, my.list2), along = 3)
Find the mean() of the required dimension
apply(my.array1, c(1, 2), mean)
apply(my.array1, c(1,2), sd)
Output
[,1] [,2] [,3]
[1,] 1.5 4.5 7.5
[2,] 2.5 5.5 8.5
[3,] 3.5 6.5 9.5

R: subsetting N-dimensional arrays

Consider the following 3-dimensional array:
set.seed(123)
arr = array(sample(c(1:10)), dim=c(3,4,2))
which yields
> arr
, , 1
[,1] [,2] [,3] [,4]
[1,] 10 9 8 2
[2,] 5 1 4 10
[3,] 6 7 3 5
, , 2
[,1] [,2] [,3] [,4]
[1,] 6 7 3 5
[2,] 9 8 2 6
[3,] 1 4 10 9
I'd like to subset it like
arr[c(1,2), c(2,4), c(1)]
but the catch is that I don't know (a) which indices or (b) which dimension the indices are.
What is the best way to access an N-dimensional array with index variables?
ll = list(c(1,2), c(2,4), c(1))
arr[ll] # doesn't work
arr[grid.expand(ll)] # doesn't work
# ..what else?
use do.call, such as:
do.call(`[`, c(list(arr), ll))
or more cleanly, using a wrapper function:
getArr <- function(...)
`[`(arr, ...)
do.call(getArr, ll)
[,1] [,2]
[1,] 10 5
[2,] 7 3
There is the asub function from the abind package:
library(abind)
asub(arr, ll)
which can also do a lot more, in particular extract along a subset of the dimensions (https://stackoverflow.com/a/17752012/1201032). Worth having in your toolbox.

colsum rowsum populating matrix

I'm trying to write for each cell entry in a matrix what value is smallest, either its rowsum value or colsum value in a new matrix of the same dimension.
For example:
say I have matrix c which looks like this:
x <- matrix(seq(1:6),2)
x
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
Its rowsum and colsum are:
rowSums(x)
[1] 9 12
colSums(x)
[1] 3 7 11
so based on that info, the new matrix should look like this:
[,1] [,2] [,3]
[1,] 3 7 9
[2,] 3 7 11
I've been thinking about using apply but I do not know how I can write an if statement to write the smallest value from either rowsum or colsum for each cell entry. Any ideas?
This can be thought of as an outer product of the row and column sums, where the function takes the minimum value:
outer(rowSums(x), colSums(x), FUN=pmin)
## [,1] [,2] [,3]
## [1,] 3 7 9
## [2,] 3 7 11
x[] <- pmin(rep(colSums(x), each = nrow(x)), rep(rowSums(x), times = ncol(x)))
x
# [,1] [,2] [,3]
# [1,] 3 7 9
# [2,] 3 7 11

Functional way to stack list of 2d matrices into 3d matrix

After a clever lapply, I'm left with a list of 2-dimensional matrices.
For example:
set.seed(1)
test <- replicate( 5, matrix(runif(25),ncol=5), simplify=FALSE )
> test
[[1]]
[,1] [,2] [,3] [,4] [,5]
[1,] 0.8357088 0.29589546 0.9994045 0.2862853 0.6973738
[2,] 0.2377494 0.14704832 0.0348748 0.7377974 0.6414624
[3,] 0.3539861 0.70399206 0.3383913 0.8340543 0.6439229
[4,] 0.8568854 0.10380669 0.9150638 0.3142708 0.9778534
[5,] 0.8537634 0.03372777 0.6172353 0.4925665 0.4147353
[[2]]
[,1] [,2] [,3] [,4] [,5]
[1,] 0.1194048 0.9833502 0.9674695 0.6687715 0.1928159
[2,] 0.5260297 0.3883191 0.5150718 0.4189159 0.8967387
[3,] 0.2250734 0.2292448 0.1630703 0.3233450 0.3081196
[4,] 0.4864118 0.6232975 0.6219023 0.8352553 0.3633005
[5,] 0.3702148 0.1365402 0.9859542 0.1438170 0.7839465
[[3]]
...
I'd like to turn that into a 3-dimensional array:
set.seed(1)
replicate( 5, matrix(runif(25),ncol=5) )
Obviously, if I'm using replicate I can just turn on simplify, but sapply does not simplify the result properly, and stack fails utterly. do.call(rbind,mylist) turns it into a 2d matrix rather than 3d array.
I can do this with a loop, but I'm looking for a neat and functional way to handle it.
The closest way I've come up with is:
array( do.call( c, test ), dim=c(dim(test[[1]]),length(test)) )
But I feel like that's inelegant (because it disassembles and then reassembles the array attributes of the vectors, and needs a lot of testing to make safe (e.g. that the dimensions of each element are the same).
Try this:
simplify2array(test)
You can use the abind package and then use abind(test, along = 3)
library(abind)
testArray <- abind(test, along = 3)
Or you could use simplify = 'array' in a call to sapply, (instead of lapply). simplify = 'array' is not the same as simplify = TRUE, as it will change the argument higher in simplify2array
eg
foo <- function(x) matrix(1:10, ncol = 5)
# the default is simplify = TRUE
sapply(1:5, foo)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 1 1 1
[2,] 2 2 2 2 2
[3,] 3 3 3 3 3
[4,] 4 4 4 4 4
[5,] 5 5 5 5 5
[6,] 6 6 6 6 6
[7,] 7 7 7 7 7
[8,] 8 8 8 8 8
[9,] 9 9 9 9 9
[10,] 10 10 10 10 10
# which is *not* what you want
# so set `simplify = 'array'
sapply(1:5, foo, simplify = 'array')
, , 1
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
, , 2
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
, , 3
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
, , 4
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
, , 5
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
An array is simply an atomic vector with dimensions. Each of the matrix components of test is really just a vector with dimensions too. Hence the simplest solution I can think of is to unroll the list test into a vector and convert that to an array using array and suitably supplied dimensions.
set.seed(1)
foo <- replicate( 5, matrix(runif(25),ncol=5) )
tmp <- array(unlist(test), dim = c(5,5,5))
> all.equal(foo, tmp)
[1] TRUE
> is.array(tmp)
[1] TRUE
> dim(tmp)
[1] 5 5 5
If you don't want to hardcode the dimensions, we have to make some assumptions but can easily fill in the dimension from test, e.g.
tmp2 <- array(unlist(test), dim = c(dim(test[[1]]), length(test)))
> all.equal(foo, tmp2)
[1] TRUE
This assumes that the dimensions of each component are all the same, but then I don't see how you could put sub-matrices into a 3-d array if that condition doesn't hold.
This may seem hacky, to unroll the list, but this is simply exploiting how R handles matrices and arrays as vectors with dimensions.
test2 <- unlist(test)
dim(test2) <- c(dim(test[[1]]),5)
or if you do not know the expected size ahead of time:
dim3 <- c(dim(test[[1]]), length(test2)/prod(dim(test[[1]])))
dim(test2) <- dim3

Resources