`apply` the `rowMeans` across multiple sets of columns - r

Sorry, people, I can't see the forest for the trees. I searched a lot but couldn't find a solution. I want, e.g., the mean for every unit (potentially the rowMeans) of a subset of variables in a matrix (or potentially a dataframe) in R. I would like to select the columns using an indexing vector as in tapply, which I called a1 in the example below.
> set.seed(23958)
> (dat <- matrix(sample(0:3, 10, replace = TRUE), ncol = 5))
[,1] [,2] [,3] [,4] [,5]
[1,] 2 3 0 2 1
[2,] 2 1 1 2 1
> set.seed(6112)
> (a1 <- sample(1:2, 5, replace = TRUE))
[1] 1 1 2 2 1
The solution in this example should look like this, but of course I would like to do it in a more comprehensive way. I was thinking I should use a function from the apply family, but I could not find out which one.
> cbind(rowMeans(dat[, a1 == 1]), rowMeans(dat[, a1 == 2]))
[,1] [,2]
[1,] 2.000000 1.0
[2,] 1.333333 1.5

You can still use tapply here:
do.call(rbind,
tapply(seq_len(ncol(dat)),a1,
function(i)rowMeans(dat[,i])))

If you transpose your data, you can use by:
t(do.call(rbind,by(t(dat),a1,colMeans)))
1 2
V1 2.000000 1.0
V2 1.333333 1.5

You could also use the aggregate function:
t(aggregate(t(dat), list(a1), mean))

Related

How to vectorize this operation?

I have a n x 3 x m array, call it I. It contains 3 columns, n rows (say n=10), and m slices. I have a computation that must be done to replace the third column in each slice based on the other 2 columns in the slice.
I've written a function insertNewRows(I[,,simIndex]) that takes a given slice and replaces the third column. The following for-loop does what I want, but it's slow. Is there a way to speed this up by using one of the apply functions? I cannot figure out how to get them to work in the way I'd like.
for(simIndex in 1:m){
I[,, simIndex] = insertNewRows(I[,,simIndex])
}
I can provide more details on insertNewRows if needed, but the short version is that it takes a probability based on the columns I[,1:2, simIndex] of a given slice of the array, and generates a binomial RV based on the probability.
It seems like one of the apply functions should work just by using
I = apply(FUN = insertNewRows, MARGIN = c(1,2,3)) but that just produces gibberish..?
Thank you in advance!
IK
The question has not defined the input nor the transformation nor the result so we can't really answer it but here is an example of adding a row of ones to to a[,,i] for each i so maybe that will suggest how you could solve the problem yourself.
This is how you could use sapply, apply, plyr::aaply, reshaping using matrix/aperm and abind::abind.
# input array and function
a <- array(1:24, 2:4)
f <- function(x) rbind(x, 1) # append a row of 1's
aa <- array(sapply(1:dim(a)[3], function(i) f(a[,,i])), dim(a) + c(1,0,0))
aa2 <- array(apply(a, 3, f), dim(a) + c(1,0,0))
aa3 <- aperm(plyr::aaply(a, 3, f), c(2, 3, 1))
aa4 <- array(rbind(matrix(a, dim(a)[1]), 1), dim(a) + c(1,0,0))
aa5 <- abind::abind(a, array(1, dim(a)[2:3]), along = 1)
dimnames(aa3) <- dimnames(aa5) <- NULL
sapply(list(aa2, aa3, aa4, aa5), identical, aa)
## [1] TRUE TRUE TRUE TRUE
aa[,,1]
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
## [3,] 1 1 1
aa[,,2]
## [,1] [,2] [,3]
## [1,] 7 9 11
## [2,] 8 10 12
## [3,] 1 1 1
aa[,,3]
## [,1] [,2] [,3]
## [1,] 13 15 17
## [2,] 14 16 18
## [3,] 1 1 1
aa[,,4]
## [,1] [,2] [,3]
## [1,] 19 21 23
## [2,] 20 22 24
## [3,] 1 1 1

how to select neighbouring elements in a vector and put them into a list or matrix in R

I have a problem about how to select neighboring elements in a vector and put them into a list or matrix in R.
For example:
vl <- c(1,2,3,4,5)
I want to get the results like this:
1,2
2,3
3,4
4,5
The results can be in a list or matrix
I know we can use a loop to get results.Like this:
pl <- list()
k=0
for (p in 1: length(vl)) {
k=k+1
pl[[k]] <- sort(c(vl[p],vl[p+1]))}
But I have a big data. Using loop is relatively slow.
Is there any function to get results directly?
Many thanks!
We can use head and tail to ignore the last and first element respectively.
data.frame(a = head(vl, -1), b = tail(vl, -1))
# a b
#1 1 2
#2 2 3
#3 3 4
#4 4 5
EDIT
If the data needs to be sorted we can use apply row-wise to sort it.
vl <- c(2,5,3,1,6,4)
t(apply(data.frame(a = head(vl, -1), b = tail(vl, -1)), 1, sort))
# [,1] [,2]
#[1,] 2 5
#[2,] 3 5
#[3,] 1 3
#[4,] 1 6
#[5,] 4 6
You can do:
matrix(c(vl[-length(vl)], vl[-1]), ncol = 2)
[,1] [,2]
[1,] 1 2
[2,] 2 3
[3,] 3 4
[4,] 4 5
If you want to sort two columns rowwise, then you can use pmin() and pmax() which will be faster than using apply(x, 1, sort) with a large number of rows.
sapply(c(pmin, pmax), do.call, data.frame(vl[-length(vl)], vl[-1]))
The problem can also be solved by applying the sort() function on a rolling window of length 2:
vl <- c(2,5,3,1,6,4)
zoo::rollapply(vl, 2L, sort)
which returns a matrix as requested:
[,1] [,2]
[1,] 2 5
[2,] 3 5
[3,] 1 3
[4,] 1 6
[5,] 4 6
Note that the modified input vector vl is used which has been posted by the OP in comments here and here.
Besides zoo, there are also other packages which offer rollapply functions, e.g.,
t(rowr::rollApply(vl, sort, 2L, 2L))

trouble rearranging my 2x2 matrix in a simple way in R

I'm trying to turn
df<-matrix(1:4,nrow = 2,ncol = 2)
df
[,1] [,2]
[1,] 1 3
[2,] 2 4
into
matrix(c(2,4,1,3),nrow = 1,ncol = 4)
2 4 1 3
so that i can run it through a for loop to rbind many entries.
I've been trying
cbind(df[row 2,],df[row 1,])
but it's not working. Is there a simple way to do this that won't require me to separate the matrix and then bring it back together?
Here is another way. Without the call to matrix it returns a vector, not a matrix.
df <- matrix(1:4, 2)
matrix(c(t(df[nrow(df):1,])), 1)
# [,1] [,2] [,3] [,4]
#[1,] 2 4 1 3
We can use
t(c(t(df[nrow(df):1, ])))
# [,1] [,2] [,3] [,4]
#[1,] 2 4 1 3
Turning a comment into an answer, a fourth option is
rev(t(m[, ncol(m):1]))
# [1] 2 4 1 3
with
m <- matrix(1:4, 2)
Maybe you can try the code below
r <- unlist(rev(data.frame(t(df))))
or
r <- do.call(c,rev(split(df,1:nrow(df))))
or
r <- unlist(rev(split(df,1:nrow(df))))

How to apply a function on every element of all elements in a list in R

I have a list containing matrices of the same size in R. I would like to apply a function over the same element of all matrices. Example:
> a <- matrix(1:4, ncol = 2)
> b <- matrix(5:8, ncol = 2)
> c <- list(a,b)
> c
[[1]]
[,1] [,2]
[1,] 1 3
[2,] 2 4
[[2]]
[,1] [,2]
[1,] 5 7
[2,] 6 8
Now I want to apply the mean function and would like to get a matrix like that:
[,1] [,2]
[1,] 3 5
[2,] 4 6
One conceptual way to do this would be to sum up the matrices and then take the average value of each entry. Try using Reduce:
Reduce('+', c) / length(c)
Output:
[,1] [,2]
[1,] 3 5
[2,] 4 6
Demo here:
Rextester
Another option is to construct an array and then use apply.
step 1: constructing the array.
Using the abind library and do.call, you can do this:
library(abind)
myArray <- do.call(function(...) abind(..., along=3), c)
Using base R, you can strip out the structure and then rebuild it like this:
myArray <- array(unlist(c), dim=c(dim(a), length(c)))
In both instances, these return the desired array
, , 1
[,1] [,2]
[1,] 1 3
[2,] 2 4
, , 2
[,1] [,2]
[1,] 5 7
[2,] 6 8
step 2: use apply to calculate the mean along the first and second dimensions.
apply(myArray, 1:2, mean)
[,1] [,2]
[1,] 3 5
[2,] 4 6
This will be more flexible than Reduce, since you can swap out many more functions, but it will be slower for this particular application.

Strange behavior of apply/ reverse function in R

I have a simple matrix:
mat = rbind(c(1:3),c(4:6),c(7:9))
mat
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 4 5 6
# [3,] 7 8 9
I want to now reverse the matrix row-wise. That is I want to obtain:
revMat
# [,1] [,2] [,3]
# [1,] 3 2 1
# [2,] 6 5 4
# [3,] 9 8 7
To do this I tried
apply(mat, 1, rev)
And the result was:
# [,1] [,2] [,3]
# [1,] 3 6 9
# [2,] 2 5 8
# [3,] 1 4 7
I find this to be extremely strange. It's like the rows are reversed and then the final matrix is transposed. I don't understand why. If I try simply, for instance,
apply(mat, 2, rev)
it gives me the expected reversal of each column
# [,1] [,2] [,3]
# [1,] 7 8 9
# [2,] 4 5 6
# [3,] 1 2 3
Therefore to obtain the final result I have to perform
t(apply(t(bg), 2, rev))
Thus obtaining the required matrix is NOT a problem for me, but I don't understand the "anomaly" in the behavior of apply/ reverse. Can anyone explain this to me?
Edit: To make clear the distinction, I already know how to do the reversal. I want to know WHY this happens. How to is clear from many earlier questions including
How to reverse a matrix in R?
apply always puts the result in the first dimension. See ?apply for more information. Assuming this input:
mat <- matrix(1:9, 3, byrow = TRUE)
here are some alternatives:
1) transpose
t(apply(mat, 1, rev))
2) avoid apply with indexing
mat[, 3:1]
3) iapply An idempotent apply was posted here:
https://stat.ethz.ch/pipermail/r-help/2006-January/086064.html
Using that we have:
iapply(mat, 1, rev)
There was also an idempotent apply, iapply, in version 0.8.0 of the reshape package (but not in the latest version of reshape): https://cran.r-project.org/src/contrib/Archive/reshape/
4) rollapply rollapply in the zoo package can be used:
library(zoo)
rollapply(mat, 1, rev, by.column = FALSE)
5) tapply The tapply expression here returns a list giving us the opportunity to put it together in the way we want -- in this case using rbind:
do.call("rbind", tapply(mat, row(mat), rev))
6) multiply by a reverse diagonal matrix Since rev is a linear operator it can be represented by a matrix:
mat %*% apply(diag(3), 1, rev)
or
mat %*% (row(mat) + col(mat) == 3+1)
If you look at the help for apply(), this is exactly the behavior you would expect:
Value
If each call to FUN returns a vector of length n, then apply returns
an array of dimension c(n, dim(X)[MARGIN]) if n > 1.
a nice option to do what you want is to use indexing:
mat[,ncol(mat):1]

Resources