Calculate the average per row per set of columns - r

What I would like to do is the following, given a matrix, for example: mat <- matrix(1:100, nrow = 4) and a set of combinations of the columns c_w <- combn(c(1,2,3,4), 2). I would like to calculate the average per combination. So for the first combination, we have rowMeans(mat[,c_w[,1]]), for the second rowMeans(mat[,c_w[,2]]). So far so good and I can wrap this in a for-loop and then use row combine to combine the results in a nice results matrix. However the problem is performance, if possible I would like to do this in a vectorized manner. So my question is:
can we do this without for loops in the R-code?
edit
I would like to have it in Matrix form, where each column stands for the mean of each set. However this can also be achieved with some small additions to Arun's code. Please turn the comment into an answer in order for me to give you points :).
Thanks

We can use the FUN argument in combn to do the rowMeans directly within the combn step after subsetting the columns of 'mat' with the column index derived from combn
combn(1:4, 2, FUN=function(x) rowMeans(mat[,x]))
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 3 5 7 7 9 11
#[2,] 4 6 8 8 10 12
#[3,] 5 7 9 9 11 13
#[4,] 6 8 10 10 12 14
Or another option if we got the combn output would be to split by the col of 'c_w' and loop through the 'list' elements with sapply, subset the 'mat' with the numeric index and get the rowMeans
sapply(split(c_w, col(c_w)), function(x) rowMeans(mat[,x]))
# 1 2 3 4 5 6
#[1,] 3 5 7 7 9 11
#[2,] 4 6 8 8 10 12
#[3,] 5 7 9 9 11 13
#[4,] 6 8 10 10 12 14
Or a third approach would be concatenate (c), the column index from c_w and use that to get the columns of 'mat', create a array with the specified dimensions. Here, we know that 4 is the number of rows of 'mat', 2 as the 'm' specified in the combn and 6 as the ncol of 'c_w'. Loop with apply, specify the MARGIN as '3', and get the rowMeans.
apply(array(mat[,c(c_w)], c(4,2,6)), 3, rowMeans)
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 3 5 7 7 9 11
#[2,] 4 6 8 8 10 12
#[3,] 5 7 9 9 11 13
#[4,] 6 8 10 10 12 14
Or as #A.Webb mentioned, apply would be more natural for a matrix like c_w
apply(c_w,2,function(i) rowMeans(mat[,i]))

Related

R - Randomize each row in a matrix separately

I am trying to randomize a matrix such that each of the rows in each column are randomized individually so that in the final matrix there is no association between columns. I know that I need to use the sample() function and some sort of for(each column) loop, but I'm not exactly sure of how to go about doing it. Specifically, I am asking how to write a function that will loop through the columns of a matrix and randomize the rows of each column.
Edit: An example of what I'm trying to achieve
Original matrix:
X1 X2 X3
1 4 3 6
2 7 2 4
3 9 5 1
Sample desired output:
X1 X2 X3
1 7 3 1
2 4 5 6
3 9 2 4
As you can see, the rows in each column have been randomized separately.
If you have a matrix X, you can use apply() (ideal for matrix)
apply(X, 2, sample)
Example:
X <- matrix(1:25, 5)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 6 11 16 21
# [2,] 2 7 12 17 22
# [3,] 3 8 13 18 23
# [4,] 4 9 14 19 24
# [5,] 5 10 15 20 25
Apply the code above gives:
# [,1] [,2] [,3] [,4] [,5]
# [1,] 3 10 11 16 21
# [2,] 5 8 12 20 22
# [3,] 4 9 14 18 24
# [4,] 2 6 15 19 25
# [5,] 1 7 13 17 23
I did not set random seed via set.seed(), so you will get different result when you run it. But all you need to know is that: the result is random.
If you have a data frame X, you'd better use sapply()
sapply(X, sample)
You could use a for loop for each column.
Or you could use:
apply(x, 2, function(col) sample(col, replace=F))

Add columns of a matrix based on values of another vector

Suppose I have the following matrix:
mat <- matrix(1:20, ncol=5)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
and the following vector
counts=c(2,1,2)
I need to collapse this matrix by adding the columns based on each value of that vector counts. That means that the first two columns most be added, the third remain equal and sum the last two columns. My resulting matrix must be like this
[,1] [,2] [,3]
[1,] 6 9 30
[2,] 8 10 32
[3,] 10 11 34
[4,] 12 12 36
How could I do this in an automatic way, given that in my case I have a very big matrix and with a vector of counts with different values?
One way would be to replicate the sequence of 'counts' by 'counts' vector, use that to split the column sequence of 'mat' to return a list, loop through the list with sapply, use the column index to subset the 'mat' for each list element and get the rowSums.
mat2 <- sapply(split(1:ncol(mat), rep(seq_along(counts), counts)),
function(i) rowSums(mat[,i,drop=FALSE]))
dimnames(mat2) <- NULL
mat2
# [,1] [,2] [,3]
#[1,] 6 9 30
#[2,] 8 10 32
#[3,] 10 11 34
#[4,] 12 12 36
Another idea, conceptually similar to akrun's:
t(rowsum(t(mat), rep(seq_along(counts), counts)))
# 1 2 3
#[1,] 6 9 30
#[2,] 8 10 32
#[3,] 10 11 34
#[4,] 12 12 36

Move columns by one [duplicate]

This question already has answers here:
Moving columns within a data.frame() without retyping
(17 answers)
Closed 8 years ago.
i would like to move columns in a matrix by one to the right.
Input <- data.frame(read.csv2 ....)
The matrix looks like:
1 2 3 4
1 2 3 4
1 2 3 4
and should be like:
4 1 2 3
4 1 2 3
4 1 2 3
I googled it but i couldn't find anything.
thanks for your help!!!
This looks like pretty good Moving columns within a data.frame() without retyping
Although the answer in comments works for a one-column shift to the right, its fiddly to extend that approach to other shifts and directions.
It boils down to generating the vector of the order of columns that you want to return, and then subsetting columns.
So your original Q boils down to generating c(4,1,2,3). There's a handy function in the magic package that can do this:
> install.packages("magic") # if you dont have it
> magic::shift(1:4,1)
[1] 4 1 2 3
So:
> Data[,magic::shift(1:ncol(Data),1)]
[,1] [,2] [,3] [,4]
[1,] 13 1 5 9
[2,] 14 2 6 10
[3,] 15 3 7 11
[4,] 16 4 8 12
answers your original question. This is then easy to extend to shifts by more than one, or negative (leftward) shifts:
> Data[,magic::shift(1:ncol(Data),-2)]
[,1] [,2] [,3] [,4]
[1,] 9 13 1 5
[2,] 10 14 2 6
[3,] 11 15 3 7
[4,] 12 16 4 8
Of course the right way is now to create matrix shift function:
> mshift = function(m,n=1){m[,magic::shift(1:ncol(m),n)]}
which you can check:
> mshift(Data,1)
[,1] [,2] [,3] [,4]
[1,] 13 1 5 9
[2,] 14 2 6 10
[3,] 15 3 7 11
[4,] 16 4 8 12

Transform a Matrix to a Matrix of Cumulative Row Averages in R

I have a matrix of predictions. Each row is a prediction for an individual and each column is the prediction from a specific model. I'd like transform this so the first column is the prediction from the 1st model, and the 2nd column is the average of the predictions of the 1st and 2nd models, etc.
So, the transformed matrix would house the running cumulative average of the observations in the original matrix.
I have a sense cumsum can be used with an apply function to achieve this, but am not sure how to arrive at an elegant result (for use with large matrices).
Thanks!
Try this:
# Initialize a testing matrix
(m <- matrix(1:12, 3, 4))
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
# Calculate cumulative average by column for each row
t(apply(m, 1, cumsum) / seq(ncol(m)))
[,1] [,2] [,3] [,4]
[1,] 1 2.5 4 5.5
[2,] 2 3.5 5 6.5
[3,] 3 4.5 6 7.5
This essentially takes the row-wise cumulative summation, then divides by a recycled array indicating the column index.
Edit: In case you're doing something similar with data frames, this approach using data.table and reshape2 packages could be useful:
library(data.table)
dt <- data.table(m)
# Add row number to melt by
dt[, row := seq(nrow(dt))]
library(reshape2)
dt.molten <- data.table(melt(dt, "row"))
# Row-level format
dt.molten[, cumsum(value) / as.numeric(variable), "row"]
row V1
1: 1 1.0
2: 1 2.5
3: 1 4.0
4: 1 5.5
5: 2 2.0
6: 2 3.5
7: 2 5.0
8: 2 6.5
9: 3 3.0
10: 3 4.5
11: 3 6.0
12: 3 7.5
Using the suggested cumsum and apply
mat <- matrix(1:24,ncol=6)
mat
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 1 5 9 13 17 21
#[2,] 2 6 10 14 18 22
#[3,] 3 7 11 15 19 23
#[4,] 4 8 12 16 20 24
t(apply(mat,1,cumsum)/(seq_len(ncol(mat))))
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 1 3 5 7 9 11
#[2,] 2 4 6 8 10 12
#[3,] 3 5 7 9 11 13
#[4,] 4 6 8 10 12 14

select submatrix in R

I have a matrix called m as follows
> m<-matrix(1:15,3,5)
> m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 4 7 10 13
[2,] 2 5 8 11 14
[3,] 3 6 9 12 15
I want to remove the first column of this matrix. Within a function I pass a value called j, which is always 1 less than the number of columns in m (In this example j is 4).
Therefore I used the following code
>m[,2:4+1]
[,1] [,2] [,3]
[1,] 7 10 13
[2,] 8 11 14
[3,] 9 12 15
But it is giving only the last 3 columns. Then I changed the code as follows
>m[,2:(4+1)]
This time I had the correct output.
Also it is giving the same output for following code as well
> m[,1:4+1]
Somebody please explain me how the following codes work?
>m[,2:4+1]
>m[,1:4+1]
: has higher precedence than +, therefore 2:4+1 gets interpreted at (2:4)+1 which is the same as 3:5:
2:4+1
[1] 3 4 5
Similarly, 1:4+1 gets interpreted as 2:5:
1:4+1
[1] 2 3 4 5
To remove columns in a matrix, its probably easier to use the negative subscript input to [:
m[,-1]
[,1] [,2] [,3] [,4]
[1,] 4 7 10 13
[2,] 5 8 11 14
[3,] 6 9 12 15

Resources