Suppose I have the following matrix:
mat <- matrix(1:20, ncol=5)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
and the following vector
counts=c(2,1,2)
I need to collapse this matrix by adding the columns based on each value of that vector counts. That means that the first two columns most be added, the third remain equal and sum the last two columns. My resulting matrix must be like this
[,1] [,2] [,3]
[1,] 6 9 30
[2,] 8 10 32
[3,] 10 11 34
[4,] 12 12 36
How could I do this in an automatic way, given that in my case I have a very big matrix and with a vector of counts with different values?
One way would be to replicate the sequence of 'counts' by 'counts' vector, use that to split the column sequence of 'mat' to return a list, loop through the list with sapply, use the column index to subset the 'mat' for each list element and get the rowSums.
mat2 <- sapply(split(1:ncol(mat), rep(seq_along(counts), counts)),
function(i) rowSums(mat[,i,drop=FALSE]))
dimnames(mat2) <- NULL
mat2
# [,1] [,2] [,3]
#[1,] 6 9 30
#[2,] 8 10 32
#[3,] 10 11 34
#[4,] 12 12 36
Another idea, conceptually similar to akrun's:
t(rowsum(t(mat), rep(seq_along(counts), counts)))
# 1 2 3
#[1,] 6 9 30
#[2,] 8 10 32
#[3,] 10 11 34
#[4,] 12 12 36
Related
I have the following matrix
> mat<-rbind(c(9,6),c(10,6),c(11,7),c(12,7),c(12,8),c(12,9),c(12,10),c(12,11),c(12,12),c(13,12))
> mat
[,1] [,2]
[1,] 9 6
[2,] 10 6
[3,] 11 7
[4,] 12 7
[5,] 12 8
[6,] 12 9
[7,] 12 10
[8,] 12 11
[9,] 12 12
[10,] 13 12
I would like to remove duplicate rows based on first column values and store the row whose entry in the second column is maximum. E.g. for the example above, the desidered outcome is
[,1] [,2]
[1,] 9 6
[2,] 10 6
[3,] 11 7
[4,] 12 12
[5,] 13 12
I tried with
> mat[!duplicated(mat[,1]),]
but I obtained
[,1] [,2]
[1,] 9 6
[2,] 10 6
[3,] 11 7
[4,] 12 7
[5,] 13 12
which is different from the desidered outcome for the entry [4,2]. Suggestions?
You can sort the matrix first, using ascending order for column 1 and descending order for column 2. Then the duplicated function will remove all but the maximum column 2 value for each column 1 value.
mat <- mat[order(mat[,1],-mat[,2]),]
mat[!duplicated(mat[,1]),]
[,1] [,2]
[1,] 9 6
[2,] 10 6
[3,] 11 7
[4,] 12 12
[5,] 13 12
Like Josephs solution, but if you add row names first you can keep the original order (which will be the same in this case).
rownames(mat) <- 1:nrow(mat)
mat <- mat[order(mat[,2], -mat[,2]),]
mat <- mat[!duplicated(mat[,1]),]
mat[order(as.numeric(rownames(mat))),]
# [,1] [,2]
# 1 9 6
# 2 10 6
# 3 11 7
# 4 12 12
# 5 13 12
First Sort then keep only the first row for each duplicate
mat <- mat[order(mat[,1], mat[,2]),]
mat[!duplicated(mat[,1]),]
EDIT: Sorry I thought your desired result is last df,Ok so you want max value
mat<-rbind(c(9,6),c(10,6),c(11,7),c(12,7),c(12,8),c(12,9),c(12,10),c(12,11),c(12,12),c(13,12))
#Reverse sort
mat <- mat[order(mat[,1], mat[,2], decreasing=TRUE),]
#Keep only the first row for each duplicate, this will give the largest values
mat <- mat[!duplicated(mat[,1]),]
#finally sort it
mat <- mat[order(mat[,1], mat[,2]),]
"I have a very long matrix, measuring 30^5 x 3 entries. I basically consists of subblocks of 10.000 30 x 3 matrices, stacked on top of one another. I want to afficiently "cbind" them, next to one another (without looping constructs), leading to a 30 x 30^4 matrix.
Just changing the matrix dimensions does not work, as R fills the new matrix per individual column.
I'm sure there is a very compact, superefficient way of doing this, and I'll slap myself on the forehead as soon as you fill me in on the obvious solution.
Thanks!"
"Just changing the matrix dimensions does not work, as R fills the new matrix per individual column."
```R
test <- matrix(c(1:18), 6, 3, byrow = FALSE)
>test
[,1] [,2] [,3]
[1,] 1 7 13
[2,] 2 8 14
[3,] 3 9 15
[4,] 4 10 16
[5,] 5 11 17
[6,] 6 12 18
dim(test) <- c(3,6)
>test
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 4 7 10 13 16
[2,] 2 5 8 11 14 17
[3,] 3 6 9 12 15 18
```
The output I'm looking for is:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 7 13 4 10 16
[2,] 2 8 14 5 11 17
[3,] 3 9 15 6 12 18
We can create a grouping variable to split the sequence of rows, subset the matrix and then cbind
do.call(cbind, lapply(split(seq_len(nrow(test)),
as.integer(gl(nrow(test), 3, nrow(test)))), function(i) test[i,]))
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 1 7 13 4 10 16
#[2,] 2 8 14 5 11 17
#[3,] 3 9 15 6 12 18
I am trying to randomize a matrix such that each of the rows in each column are randomized individually so that in the final matrix there is no association between columns. I know that I need to use the sample() function and some sort of for(each column) loop, but I'm not exactly sure of how to go about doing it. Specifically, I am asking how to write a function that will loop through the columns of a matrix and randomize the rows of each column.
Edit: An example of what I'm trying to achieve
Original matrix:
X1 X2 X3
1 4 3 6
2 7 2 4
3 9 5 1
Sample desired output:
X1 X2 X3
1 7 3 1
2 4 5 6
3 9 2 4
As you can see, the rows in each column have been randomized separately.
If you have a matrix X, you can use apply() (ideal for matrix)
apply(X, 2, sample)
Example:
X <- matrix(1:25, 5)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 6 11 16 21
# [2,] 2 7 12 17 22
# [3,] 3 8 13 18 23
# [4,] 4 9 14 19 24
# [5,] 5 10 15 20 25
Apply the code above gives:
# [,1] [,2] [,3] [,4] [,5]
# [1,] 3 10 11 16 21
# [2,] 5 8 12 20 22
# [3,] 4 9 14 18 24
# [4,] 2 6 15 19 25
# [5,] 1 7 13 17 23
I did not set random seed via set.seed(), so you will get different result when you run it. But all you need to know is that: the result is random.
If you have a data frame X, you'd better use sapply()
sapply(X, sample)
You could use a for loop for each column.
Or you could use:
apply(x, 2, function(col) sample(col, replace=F))
What I would like to do is the following, given a matrix, for example: mat <- matrix(1:100, nrow = 4) and a set of combinations of the columns c_w <- combn(c(1,2,3,4), 2). I would like to calculate the average per combination. So for the first combination, we have rowMeans(mat[,c_w[,1]]), for the second rowMeans(mat[,c_w[,2]]). So far so good and I can wrap this in a for-loop and then use row combine to combine the results in a nice results matrix. However the problem is performance, if possible I would like to do this in a vectorized manner. So my question is:
can we do this without for loops in the R-code?
edit
I would like to have it in Matrix form, where each column stands for the mean of each set. However this can also be achieved with some small additions to Arun's code. Please turn the comment into an answer in order for me to give you points :).
Thanks
We can use the FUN argument in combn to do the rowMeans directly within the combn step after subsetting the columns of 'mat' with the column index derived from combn
combn(1:4, 2, FUN=function(x) rowMeans(mat[,x]))
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 3 5 7 7 9 11
#[2,] 4 6 8 8 10 12
#[3,] 5 7 9 9 11 13
#[4,] 6 8 10 10 12 14
Or another option if we got the combn output would be to split by the col of 'c_w' and loop through the 'list' elements with sapply, subset the 'mat' with the numeric index and get the rowMeans
sapply(split(c_w, col(c_w)), function(x) rowMeans(mat[,x]))
# 1 2 3 4 5 6
#[1,] 3 5 7 7 9 11
#[2,] 4 6 8 8 10 12
#[3,] 5 7 9 9 11 13
#[4,] 6 8 10 10 12 14
Or a third approach would be concatenate (c), the column index from c_w and use that to get the columns of 'mat', create a array with the specified dimensions. Here, we know that 4 is the number of rows of 'mat', 2 as the 'm' specified in the combn and 6 as the ncol of 'c_w'. Loop with apply, specify the MARGIN as '3', and get the rowMeans.
apply(array(mat[,c(c_w)], c(4,2,6)), 3, rowMeans)
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 3 5 7 7 9 11
#[2,] 4 6 8 8 10 12
#[3,] 5 7 9 9 11 13
#[4,] 6 8 10 10 12 14
Or as #A.Webb mentioned, apply would be more natural for a matrix like c_w
apply(c_w,2,function(i) rowMeans(mat[,i]))
I have 17 square matrices of order 430 and a large matrix of dimension 92235 x 34
For each square matrix, I wish to add each row's value (from 1 to 430) to a column in the large matrix, taking only those values above the main diagonal. So [1,2][1,3][1,4]..[1,430][2,3][2,4]...[2,430]...[429,430] -- hence, the 92235 row length of the large matrix (A sample of Step 1 is shown here http://imgur.com/4SlUenK)
The square matrix is transposed
Step 1 is repeated but the row values are added to the next column in the large matrix
Repeat Steps 1-3 16 more times until the large matrix is filled
How do I go about doing this?
TIA
EDIT FOR COMMENT
mat = matrix(1:25,5,5)
mat
[,1] [,2] [,3] [,4] [,5]
[1,] 1 6 11 16 21
[2,] 2 7 12 17 22
[3,] 3 8 13 18 23
[4,] 4 9 14 19 24
[5,] 5 10 15 20 25
mat2 = cbind(mat[upper.tri(mat)])
mat2
[,1]
[1,] 6
[2,] 11
[3,] 12
[4,] 16
[5,] 17
[6,] 18
[7,] 21
[8,] 22
[9,] 23
[10,] 24
This reads columns then rows. Instead, I would like to read rows then columns so that the result should be:
[,1]
[1,] 6
[2,] 11
[3,] 16
[4,] 21
[5,] 12
[6,] 17
[7,] 22
[8,] 18
[9,] 23
[10,] 24
If you have 17 square matrices ('m1', 'm2',...'m17'), keep them in a list and then use upper.tri to extract the elements above the diagonal and cbind with the elements from the transpose
lst1 <- mget(paste0('m', 1:17))
Out <- do.call(cbind,lapply(lst1, function(x) {x1 <- t(x)
cbind(x[upper.tri(x)], x1[upper.tri(x1)]) }))
dim(Out)
#[1] 92235 34
Here I created the matrices in a list.
Update
Based on the row order of the data,
mat1 <- mat
mat1[lower.tri(mat1, diag=TRUE)] <- NA
as.vector(na.omit(unlist(tapply(mat1, row(mat1), FUN=I))))
#[1] 6 11 16 21 12 17 22 18 23 24
Or as #David Arenburg mentioned in the comments
temp <- t(mat)
temp[lower.tri(temp)]
#[1] 6 11 16 21 12 17 22 18 23 24
You can replace the steps as here in the lapply.
data
set.seed(24)
lst1 <- replicate(17, matrix(sample(1:200, 430*430, replace=TRUE),
430, 430), simplify=FALSE)