positions of non-NA cells in a matrix - r

Consider the following matrix,
m <- matrix(letters[c(1,2,NA,3,NA,4,5,6,7,8)], 2, byrow=TRUE)
## [,1] [,2] [,3] [,4] [,5]
## [1,] "a" "b" NA "c" NA
## [2,] "d" "e" "f" "g" "h"
I wish to obtain the column indices corresponding to all non-NA elements, merged with the NA elements immediately following:
result <- c(list(1), list(2:3), list(4,5),
list(1), list(2), list(3), list(4), list(5))
Any ideas?

The column (and row) indicies of non-NA elements can be obtained with
which(!is.na(m), TRUE)
A full answer:
Since you want to work row-wise, but R treats vector column-wise, it is easier to work on the transpose of m.
t_m <- t(m)
n_cols <- ncol(m)
We get the array indicies as mentioned above, which gives the start point of each list.
ind_non_na <- which(!is.na(t_m), TRUE)
Since we are working on the transpose, we want the row indices, and we need to deal with each column separately.
start_points <- split(ind_non_na[, 1], ind_non_na[, 2])
The length of each list is given by the difference between starting points, or the difference between the last point and the end of the row (+1). Then we just call seq to get a sequence.
unlist(
lapply(
start_points,
function(x)
{
len <- c(diff(x), n_cols - x[length(x)] + 1L)
mapply(seq, x, length.out = len, SIMPLIFY = FALSE)
}
),
recursive = FALSE
)

This will get you close:
cols <- col(m)
cbind(cols[which(is.na(m))-1],cols[is.na(m)])
[,1] [,2]
[1,] 2 3
[2,] 4 5

Related

Return the list of dimnames of 3D array by selecting specific elements in r

When I have a 3D array, I want to obtain the list of row names and column names when selecting some elements in array.
For example,
mdat <- array(c(1,2,3, 11,12,13,1,2,3,2,3,4,3,4,5,4,5,6), dim= c(2, 3, 2),dimnames = list(c("row1", "row2"), c("C.1", "C.2", "C.3"),c("m1","m2")))
which(mdat[,,2]==2) returns location of the elements but I want to obtain the paired row names and column names which are (row 2,c.1),(row 2,c.2). I haven't found a way to get the dimname in 3D array. Have anyone tried this?? Any suggestion is appreciated.
You can try the following :
tmp <- mdat[,,2]
mat <- which(tmp==2, arr.ind = TRUE)
cbind(rownames(tmp)[mat[, 1]], colnames(tmp)[mat[, 2]])
# [,1] [,2]
#[1,] "row2" "C.1"
#[2,] "row2" "C.2"
Maybe you can try the code below with which + dimnames like below
inds <- which(mdat[,,2]==2,arr.ind = T)
nm <- dimnames(mdat[,,2])
sapply(seq_along(nm),function(k) nm[[k]][inds[,k]])
which gives
[,1] [,2]
[1,] "row2" "C.1"
[2,] "row2" "C.2"

scaling standardized or weighting

I hope I can find answer for this question here. I have this piece of code that I am trying to analyze closely,
alphas <- matrix(runif(900), ncol=3, byrow=TRUE)
z <- t(apply(alphas, 1, cumsum))
for(i in 1:nrow(z)){
z[i, ] <- z[i, ] / (1:ncol(z))
}
I am trying to understand what does z[i,]<- z[i,]/(1:ncol(z)) code is doing for the matrix alphas. I know we are dividing each column by the sequence of columns in the input matrix. I also know when using apply with margin 2, we apply the function we are interested in, which is in this case "cumsum" over the rows of matrix alphas. Thats basically what I know, I have no clue why the next line and what does to my matrix alphas?
I would appreciate some insigts
Thank you very much
With your code I would say you are calculating row-wise cumulative means of your alphas.
With the line in your loop you're doing a vector division that yields the averages of cumulative sums of each column.
Look what ncol(z) yields
> ncol(z)
[1] 3
So basically what you're doing with z[i, ] / (1:ncol(z)) in your loop is a division of each row by a vector, or sequence respectively, with length of column numbers, i.e. c(1, 2, 3) or just 1:3.
Consider the first row of your alphas and your z.
set.seed(42) # for sake of reproducibility
alphas <- matrix(runif(900), ncol=3, byrow=TRUE)
z <- t(apply(alphas, 1, cumsum))
> alphas[1, ]
[1] 0.9148060 0.9370754 0.2861395
> z[1, ]
[1] 0.914806 1.851881 2.138021
> cbind(alphas[1, 1], mean(c(alphas[1, 1:2])), mean(c(alphas[1, 1:3])))
[,1] [,2] [,3]
[1,] 0.914806 0.9259407 0.7126737
The core of your loop yields
> z[1, ] / 1:ncol(z)
[1] 0.9148060 0.9259407 0.7126737
So each element of a row of z[1, ] will be divided by its corresponding divisor of the vector, yielding the means of the aggregated cells of
Your loop simply does this for your whole z matrix.
Apropos—faster and more convenient in R we do this in a vectorized way within a function. Since you understand apply() you will understand sapply(). Which we will use by first defining a function.
FUN1 <- function(i){
z[i, ] / 1:ncol(z)
}
M <- t(sapply(1:nrow(z), FUN1))
> head(M, 3)
[,1] [,2] [,3]
[1,] 0.9148060 0.9259407 0.7126737
[2,] 0.8304476 0.7360966 0.6637630
[3,] 0.7365883 0.4356275 0.5094157
This yields the same as your loop but in the R way.
In one step we can do this saying
z <- t(sapply(seq_len(nrow(alphas)),
function(i) cumsum(alphas[i, ]) / seq_along(alphas[i, ])))
> head(z, 3)
[,1] [,2] [,3]
[1,] 0.9148060 0.9259407 0.7126737
[2,] 0.8304476 0.7360966 0.6637630
[3,] 0.7365883 0.4356275 0.5094157

Multiply a matrix' columns by its columns

I have a 4x100 matrix where I would like to multiply column 1 with row 1 in its transpose etc and store these matrices somewhere to be able to take the sum of these new matrices lateron.
I really don't know where to start due to the fact that I get 4x4 matrices after the column-row-multiplication. Due to this fact I cannot store them in a matrix
data:
mm num[1:4,1:100]
mm_t num[1:100,1:4]
I'm thinking of creating a list in some way
list1=list()
for(i in 1:100){
list1[i] <- mm[,i]%*%mm_t[i,]
}
but I need some more indices i think because this just leaves me with a number in each argument..
First, your call for data is not clear. Second, are you tryign to multiply each value by itself, or do matrix multiplication
We create a 4x100 matrix and its transpose:
mm <- matrix(1:400, nrow = 4, ncol = 100)
mm.t <- t(mm)
Then we can do the matrix multiplication (which is what you did, and you get a 4 x 4 matrix from the definition of matrix multiplication https://www.wikiwand.com/en/Matrix_multiplication)
If we want to multiply each index by itself (so mm[1,1] by mm [1,1]) then:
mm * mm
This will result in 4x100 matrix where each value is the square of the original value.
If we want the matrix multiplication of each column with itself, then:
sapply(1:100, function(x) {
mm[, x] %*% mm[, x]
})
This results in 100 values: each one is the matrix product of a 4x1 vector with itself.
Let's start with some sample data. Please get in the habit of including things like this in your question:
nr = 4
nc = 100
set.seed(47)
mm = matrix(runif(nr * nc), nrow = nr)
Here's a working answer, very similar to your attempt:
result = list()
for (i in 1:ncol(mm)) result[[i]] = mm[, i] %*% t(mm[, i])
result[1:2]
# [[1]]
# [,1] [,2] [,3] [,4]
# [1,] 0.9544547 0.3653018 0.7439585 0.8035430
# [2,] 0.3653018 0.1398132 0.2847378 0.3075428
# [3,] 0.7439585 0.2847378 0.5798853 0.6263290
# [4,] 0.8035430 0.3075428 0.6263290 0.6764924
#
# [[2]]
# [,1] [,2] [,3] [,4]
# [1,] 0.3289532 0.3965557 0.2231443 0.2689613
# [2,] 0.3965557 0.4780511 0.2690022 0.3242351
# [3,] 0.2231443 0.2690022 0.1513691 0.1824490
# [4,] 0.2689613 0.3242351 0.1824490 0.2199103
As to why yours didn't work, we can experiment and see that indeed we get a number rather than a matrix. The reason is that when you subset a single row or column of a matrix, the dimensions are "dropped" and it is coerced to a plain vector. And when you matrix multiply two vectors, you get their dot product.
mmt = t(mm)
mm[, 1] %*% mmt[1, ]
# [,1]
# [1,] 2.350646
dim(mm[, 1])
# NULL
dim(mmt[1, ])
# NULL
We can avoid this by specifying drop = FALSE in the subset code
dim(mmt[1, , drop = FALSE])
# [1] 1 4
And thus slightly modify your attempt, just adding drop = FALSE will make it work.
res2 = list()
for (i in 1:ncol(mm)) res2[[i]] = mm[, i] %*% mmt[i, , drop = FALSE]
identical(result, res2)
# [1] TRUE

Matching and replacing with for loops

Having issues with a for loop.
I am trying to take elements a b c d from each pathway (pathway matrix) and match them to expression data (expression matrix) and put them into a new matrix which look similar to pathway matrix but now contains the elements from expression matrix.
I am trying to acheve this final matrix outcome.
a <- c("pathway","1","4","7","pathway-2","1","e","g","pathway-3","4","g","h")
pathway<-matrix(a,3,4, byrow=T)
The code will be easier to understand than my wording I hope.
a <- c("pathway","b","c","d","pathway-2","b","e","g","pathway-3","c","g","h")
pathway<-matrix(a,3,4, byrow=T)
b <- c("b",1,"c",4,"d",7)
expression<-matrix(b,3,2, byrow=T)
new<-matrix("a",3,4)
new[1:3,1]<-pathway[,1]
for (x in 1:nrow(expression)){
for (y in 1:ncol(pathway)){
if(expression[x,1]==pathway[x,y]){
new[x,y]<-expression[x,2]
}
}
}
Here is one way to do it. We match each column of pathway[,-1] with the expression[,1] matrix, and use the resulting matrix as index for the values from expression[,2]. The ones not found return NA so we index them and replace them from the original matrix. Then cbind as usual to get desired matrix.
new_m <- apply(pathway[, -1], 2, function(i) expression[,2][match(i, expression[,1])])
new_m[which(is.na(new_m))] <- pathway[,-1][which(is.na(new_m))]
cbind(pathway[,1], new_m)
# [,1] [,2] [,3] [,4]
#[1,] "pathway" "1" "4" "7"
#[2,] "pathway-2" "1" "e" "g"
#[3,] "pathway-3" "4" "g" "h"

How to combine subsequent list elements into a new list in R?

For example: I have a list of matrices, and I would like to evaluate their differences, sort of a 3-D diff. So if I have:
m1 <- matrix(1:4, ncol=2)
m2 <- matrix(5:8, ncol=2)
m3 <- matrix(9:12, ncol=2)
mat.list <- list(m1,m2,m3)
I want to obtain
mat.diff <- list(m2-m1, m3-m2)
The solution I found is the following:
mat.diff <- mapply(function (A,B) B-A, mat.list[-length(mat.list)], mat.list[-1])
Is there a nicer/built-in way to do this?
You can do this with just lapply or other ways of looping:
mat.diff <- lapply( tail( seq_along(mat.list), -1 ),
function(i) mat.list[[i]] - mat.list[[ i-1 ]] )
You can use combn to generate the indexes of matrix and apply a function on each combination.
combn(1:length(l),2,FUN=function(x)
if(diff(x) == 1) ## apply just for consecutive index
l[[x[2]]]-l[[x[1]]],
simplify = FALSE) ## to get a list
Using #Arun data, I get :
[[1]]
[,1] [,2]
[1,] 4 4
[2,] 4 4
[[2]]
NULL
[[3]]
[,1] [,2]
[1,] 4 4
[2,] 4 4

Resources