I have a matrix that is 39 columns wide and I want to get the average values across rows for the first three columns then next three ect. so I would have 13 columns total after everything was done. Triple would be the indexes I would like to use but it just makes a vector from 1:39.
Triple <- c(1:3, 4:6, 7:9, 10:12, 13:15, 16:18, 19:21, 22:24, 25:27, 28:30, 31:33, 34:36, 37:39)
AveFPKM <- matrix(nrow=54175, ncol=13)
for (i in 1:39){
Ave <- rowMeans(AllFPKM[,i:i+2])
AveFPKM[,i] <- Ave
i+2
}
Thanks for the help
With some specifying of dimensions and apply-ing, you can pretty easily get your result. Here's a smaller example:
test <- matrix(1:36,ncol=12)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
#[1,] 1 4 7 10 13 16 19 22 25 28 31 34
#[2,] 2 5 8 11 14 17 20 23 26 29 32 35
#[3,] 3 6 9 12 15 18 21 24 27 30 33 36
Now get the mean of each row in each block of three columns:
apply(structure(test,dim=c(3,3,4)),c(1,3),mean)
# [,1] [,2] [,3] [,4]
#[1,] 4 13 22 31
#[2,] 5 14 23 32
#[3,] 6 15 24 33
Or generally, assuming your number of columns is always exactly divisible by the group size:
grp.row.mean <- function(x,grpsize) {
apply(structure(x,dim=c(nrow(x),grpsize,ncol(x)/grpsize)),c(1,3),mean)
}
grp.row.mean(test,3)
Here's a solution using sapply, taking advantage of the fact we know the number of columns is exactly a multiple of 3:
sapply(1:13, function(x) {
i <- (x-1)*3 + 1 # Get the actual starting index
rowMeans(AveFPKM[,i:(i+2)])
})
Related
I'm currently looking at bank data for 9 consecutive quarters. I now want to only keep those banks for which I have data from all 9 quarters. Each bank has a unique certification ID. How can I filter using the ID and only keep banks with 9 consecutive observations?
Maybe a way to do this is to count how often a certification ID (cert) shows up and keep only the ones with 9 observations? So this is what I tried:
df <- (...)
a = rle(sort(df$cert))
b = data.frame(id=a$values, n=a$lengths)
c = subset(b, n==9)
I'm unsure if this is correct because I'm trying to reproduce the results of a research paper but the numbers don't match anymore after this step.
One option would be n_distinct with group_by, Grouped by 'id', check whether the number of distinct elements in 'qtr' is 9 and filter those 'id's rows
library(dplyr)
df %>%
group_by(id) %>%
filter(n_distinct(qtr) ==9)
library(tidyverse)
df<-data.frame(id=rep(1:4,times=9),
qtr=rep(1:9,each=4))
df%>%
filter(id %in% (df%>%
count(id)%>%
filter(n>8)%>%.$id))
Generated an example. Use rowSums and !is.na to count the number of rows with values for all 9 columns.
a[rowSums(!is.na(a))==9,]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 4 7 10 13 16 19 22 25
[2,] 3 6 9 12 15 18 21 24 27
The data used.
a <- matrix(1:27, ncol=9, nrow=3)
a[2,2] <- NA
a
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 4 7 10 13 16 19 22 25
[2,] 2 NA 8 11 14 17 20 23 26
[3,] 3 6 9 12 15 18 21 24 27
I have a matrix like :
m <- matrix(c(1:32),ncol = 8)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 5 9 13 17 21 25 29
[2,] 2 6 10 14 18 22 26 30
[3,] 3 7 11 15 19 23 27 31
[4,] 4 8 12 16 20 24 28 32
I want to sum up and combine multiple say, 3 columns eg. columns 1,2 and 3 and replace the value of column 1 by the the resulting vector.
[,1] [,4] [,5] [,6] [,7] [,8]
[1,] 15 13 17 21 25 29
[2,] 18 14 18 22 26 30
[3,] 21 15 19 23 27 31
[4,] 24 16 20 24 28 32
My question is what is the best way to do this.
I have taken the sum and replaced the matrix with the vector.
X<-rowSums(m[,c(1,2,3)]); m[,1] <- X; m <- m[,-c(2,3)]
The columns are named in my case. Is there a better way to do this ?
We can use the numeric index of columns to subset and do the rowSums, then cbind with the columns that are not used in the rowSums
cbind(rowSums(m[,1:3]), m[, -(1:3)])
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 15 13 17 21 25 29
#[2,] 18 14 18 22 26 30
#[3,] 21 15 19 23 27 31
#[4,] 24 16 20 24 28 32
You could also resort to using apply, which is bit lengthier than the rowSums() approach.
cbind(apply(m[,-c(4:ncol(m))], 1, function(x){sum(x,na.rm=T)} ), m[,c(4:ncol(m))])
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 15 13 17 21 25 29
#[2,] 18 14 18 22 26 30
#[3,] 21 15 19 23 27 31
#[4,] 24 16 20 24 28 32
However, rowSums would undoubtedly be the computationally faster way to go.
enter image description hereI would like to write a script to show the roulette wheel numbers in my screen as the casino's order, like first row is 3,6,9,----36, and second row is 2,5,8,----35, and the third row is 1,4,7,---34.
I have written something
a= seq(3,36,3)
b= seq(2,35,3)
c=seq(1,34,3)
print (rbind(a,b,c))
but it maybe not the most efficient way, so does anyone have an exactly best answer?
matrix(1:36,ncol=12)[3:1,] gives you:
> matrix(1:36,ncol=12)[3:1,]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 3 6 9 12 15 18 21 24 27 30 33 36
[2,] 2 5 8 11 14 17 20 23 26 29 32 35
[3,] 1 4 7 10 13 16 19 22 25 28 31 34
which is the same layout as your code but without row labels.
Its just a matrix with 12 columns with flipped rows via the [3:1,] index.
If you want to make an image of that then I suggest you use a screengrab program.
Suppose I have two square matrices (actually many more) that are bound together:
mat = matrix(1:18,nrow=3,ncol=6)
mat
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 4 7 10 13 16
[2,] 2 5 8 11 14 17
[3,] 3 6 9 12 15 18
I want to take the transpose of each (3x3) matrix and keep them glued side by side, so the result is:
mat2
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 3 10 11 12
[2,] 4 5 6 13 14 15
[3,] 7 8 9 16 17 18
I do not want to do this manually because it is MANY matrices cbound together, not just 2.
I would like a solution that avoids looping or apply (which is just a wrapper for a loop). I need the efficient solution because this will have to run tens of thousands of times.
One way is to use matrix indexing
matrix(t(m), nrow=nrow(m))[, c(matrix(1:ncol(m), nrow(m), byrow=T)) ]
This takes the transposed matrix and rearanges the columns in the desired order.
m <- matrix(1:18,nrow=3,ncol=6)
matrix(t(m), nrow=nrow(m))
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 1 10 2 11 3 12
# [2,] 4 13 5 14 6 15
# [3,] 7 16 8 17 9 18
So we want the 1st, 3rd, and 5th columns, and 2, 4, and 6th columns together.
One way is to index these with
c(matrix(1:ncol(m), nrow(m), byrow=T))
#[1] 1 3 5 2 4 6
As an alternative, you could use
idx <- rep(1:ncol(m), each=nrow(m), length=ncol(m)) ;
do.call(cbind, split.data.frame(t(m), idx))
Try on a new matrix
(m <- matrix(1:50, nrow=5))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 6 11 16 21 26 31 36 41 46
# [2,] 2 7 12 17 22 27 32 37 42 47
# [3,] 3 8 13 18 23 28 33 38 43 48
# [4,] 4 9 14 19 24 29 34 39 44 49
# [5,] 5 10 15 20 25 30 35 40 45 50
matrix(t(m), nrow=nrow(m))[, c(matrix(1:ncol(m), nrow(m), byrow=T)) ]
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 2 3 4 5 26 27 28 29 30
# [2,] 6 7 8 9 10 31 32 33 34 35
# [3,] 11 12 13 14 15 36 37 38 39 40
# [4,] 16 17 18 19 20 41 42 43 44 45
# [5,] 21 22 23 24 25 46 47 48 49 50
This might do it:
mat = matrix(1:18,nrow=3,ncol=6)
mat
output <- lapply(seq(3, ncol(mat), 3), function(i) { t(mat[, c((i - 2):i)]) } )
output
do.call(cbind, output)
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 1 2 3 10 11 12
#[2,] 4 5 6 13 14 15
#[3,] 7 8 9 16 17 18
Was curious and timed the two approaches. The matrix approach used by user20650 is much faster than the lapply approach I used:
library(microbenchmark)
mat = matrix(1:1600, nrow=4, byrow = FALSE)
lapply.function <- function(x) {
step1 <- lapply(seq(nrow(mat), ncol(mat), nrow(mat)), function(i) {
t(mat[, c((i - (nrow(mat) - 1) ):i)])
} )
l.output <- do.call(cbind, step1)
return(l.output)
}
lapply.output <- lapply.function(mat)
matrix.function <- function(x) {
m.output <- matrix(t(mat), nrow=nrow(mat))[, c(matrix(1:ncol(mat), nrow(mat), byrow=TRUE)) ]
}
matrix.output <- matrix.function(mat)
identical(lapply.function(mat), matrix.function(mat))
microbenchmark(lapply.function(mat), matrix.function(mat), times = 1000)
#Unit: microseconds
# expr min lq mean median uq max neval
# lapply.function(mat) 735.602 776.652 824.44917 791.443 809.856 2260.834 1000
# matrix.function(mat) 32.298 35.619 37.75495 36.826 37.732 78.481 1000
I have a little problem with a fast matrix transformation. That transformation has to be performed a lot of times, so I'm looking for a fast way to this. Imagine a given matrix A and a integer parameter L. Matrix A should be transformed into a new matrix newA with L rows and nrow(A)*ncol(A)/L columns. I take L rows of A and want to transform them into columns. A little example with two possible solutions to this problem newA1 and newA2 where newA1 = newA2 should clarify my explanations:
A = matrix(1:24,6,4,byrow=T) #example matrix
L = 2 #number of rows for new matrix
newA1 = NULL
newA2 = matrix(0,L,nrow(A)*ncol(A)/L)
for(i in 1:(nrow(A)/L)){
newA1 = cbind(newA1,A[((i-1)*L+1):(i*L),]) #slower than newA2
newA2[1:L,((i-1)*ncol(A)+1):(i*ncol(A))] = A[((i-1)*L+1):(i*L),] #faster than newA1
}
The matrices look like that:
> A
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
[4,] 13 14 15 16
[5,] 17 18 19 20
[6,] 21 22 23 24
> newA1
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 2 3 4 9 10 11 12 17 18 19 20
[2,] 5 6 7 8 13 14 15 16 21 22 23 24
L=3 works also in this example.