Transforming NA to specific arrays of a matrix in R - r

I have a matrix of the form,
mat <- matrix(1:25, 5,5)
that looks like the following:
Now, I need to transform this matrix in the form as shown below:
That is, I want to keep all elements of row 2 and 4 as well as column 2 and 4 and replace all other values with NA. This a just a simple example to explain the problem. My actual matrix size is about 2000 X 2000. Any help would be much appreciated.

Your first and second matrices are a different in that the first one is filled as R would fill a matrix (i.e. column-major order) and the second is row-major.
Assuming that you meant to have identical matrices, your task can be addressed with simple matrix operations:
mat <- matrix(1:25, 5,5)
mat2 <- matrix(NA, 5,5)
mat2[c(2,4),] <- 1
mat2[,c(2,4)] <- 1
mat * mat2
[,1] [,2] [,3] [,4] [,5]
[1,] NA 6 NA 16 NA
[2,] 2 7 12 17 22
[3,] NA 8 NA 18 NA
[4,] 4 9 14 19 24
[5,] NA 10 NA 20 NA
If not, just transpose your initial matrix with t(mat) and follow the same approach as above.

mat = t(mat)
replace(x = mat, which((matrix(row(mat) %in% c(2, 4), NROW(mat), NCOL(mat)) |
matrix(col(mat) %in% c(2, 4), NROW(mat), NCOL(mat))) == FALSE,
arr.ind = TRUE), NA)
# [,1] [,2] [,3] [,4] [,5]
#[1,] NA 2 NA 4 NA
#[2,] 6 7 8 9 10
#[3,] NA 12 NA 14 NA
#[4,] 16 17 18 19 20
#[5,] NA 22 NA 24 NA

Related

Replacing diagonal elements using dplyr pipe

I want to replace the diagonal elements of a matrix in the middle of a piping process but can't figure out a way to do this. I know I can replace the diagonal elements this using diag() function, but I just don't know how to use diag() function inside a piping process. Sample data is given below and I want the following steps put together in a piping process. Thanks in advance.
aa <- matrix(1:25, nrow =5)
diag(aa) <- NA
One option could be:
aa %>%
`diag<-`(., NA)
[,1] [,2] [,3] [,4] [,5]
[1,] NA 6 11 16 21
[2,] 2 NA 12 17 22
[3,] 3 8 NA 18 23
[4,] 4 9 14 NA 24
[5,] 5 10 15 20 NA
We could use replace with a logical condition
library(dplyr)
aa %>%
replace(., col(.) == row(.), NA)
-output
# [,1] [,2] [,3] [,4] [,5]
#[1,] NA 6 11 16 21
#[2,] 2 NA 12 17 22
#[3,] 3 8 NA 18 23
#[4,] 4 9 14 NA 24
#[5,] 5 10 15 20 NA

Picking top n% percent of elements from matrix rows, different number of elements on each row

I have a problem with picking the top n% largest and smallest element's
from each data matrix row. Specifically, I would like to find the column numbers of those top n% elements. This would not be a problem if each row had the same number of non-NA-elements, but in this situation the number of picked elements is different for each row. Here's an example of the situation (the real data matrix is 195x1030 so I'wont be using it here), where top 40% are picked
data=
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 NA 100 98 200 78 80 35 NA 55
[2,] 32 67 15 73 NA 12 91 230 3 99
[3,] NA NA NA 45 53 26 112 64 80 41
[4,] 54 38 60 70 163 69 109 205 5 31
[5,] 107 28 296 254 30 40 NA 18 28 90
The resulting top 40% column numbers matrixes should look like these (the number of picked elements is calculated by rounding down, as the function as.integer does)
largest= smallest=
[,1] [,2] [,3] [,4] [,1] [,2] [,3] [,4]
[1,] 5 3 4 NA [1,] 1 8 10 NA
[2,] 8 10 7 NA [2,] 9 6 3 NA
[3,] 7 9 NA NA [3,] 6 10 NA NA
[4,] 8 5 7 4 [4,] 9 10 2 1
[5,] 3 4 1 10 [5,] 8 9 2 5
So the top numbers are selected looking only at the non-NA-elements of the rows. For example the first row of data matrix contains only 8 non-NA-numbers and thus 40%*8=3,2~ 3 elements are selected. This creates the NA's to the resulting matrixes.
Once again, I tried using a for-loop (this code is to finding the largest 40%):
largest <- matrix(rep(NA, 20), nrow = 5)
for(i in 1:5){
largest[i,]<-order(data[i,], decreasing=T)
[1:as.integer(0.4*nrow(data[complete.cases(data[,i]),]))]
}
but R returns an error: "number of items to replace is not a multiple of replacement length", which I think means that since not all the elements of the original largest-matrix are not replaced while looping, this for-loop can't be used. Am I right?
How could this sort of picking be done?
The following reproduces your expected output
# Determine number of columns for output matrix as
# maximum of 40% of all non-NA values per row
ncol <- max(floor(apply(mat, 1, function(x) sum(!is.na(x))) * 0.4))
# Top 40% largest
t(apply(mat, 1, function(x) {
n <- floor(sum(!is.na(x)) * 0.4);
replace(rep(NA, ncol), 1:n, order(x, decreasing = T)[1:n])
}))
# [,1] [,2] [,3] [,4]
#[1,] 5 3 4 NA
#[2,] 8 10 7 NA
#[3,] 7 9 NA NA
#[4,] 8 5 7 4
#[5,] 3 4 1 NA
# Top 40% smallest
t(apply(mat, 1, function(x) {
n <- floor(sum(!is.na(x)) * 0.4);
replace(rep(NA, ncol), 1:n, order(x, decreasing = F)[1:n])
}))
# [,1] [,2] [,3] [,4]
#[1,] 1 8 10 NA
#[2,] 9 6 3 NA
#[3,] 6 10 NA NA
#[4,] 9 10 2 1
#[5,] 8 2 9 NA
Explanation: We first determine the max number of columns for both output matrices; we then loop through mat row-by-row, determine the row-specific number n of non-NA entries corresponding to 40% of all non-NA numbers in that row, and return a column vector of the top 40% decreasing/increasing entries padded with NAs. Final transpose gives the expected output.
Posting my (less exact and very similar) answer as it is in form of a function, which might be handy:
toppct <- function(x, p, largest = TRUE){
t(apply(x, 1, function(y){
c(which(y %in% sort(y, decreasing = largest)[1:floor(length(which(!is.na(y)))*p)]),
rep(NA, floor(length(y)*p) - floor(length(which(!is.na(y)))*p)))
}))
}
This produces the output in the question, just without sorting the top percent positions. For smallest, just set largest = FALSE.
> toppct(mat, .4)
[,1] [,2] [,3] [,4]
[1,] 3 4 5 NA
[2,] 7 8 10 NA
[3,] 7 9 NA NA
[4,] 4 5 7 8
[5,] 1 3 4 NA
> toppct(mat, .4, largest = FALSE)
[,1] [,2] [,3] [,4]
[1,] 1 8 10 NA
[2,] 3 6 9 NA
[3,] 6 10 NA NA
[4,] 1 2 9 10
[5,] 2 8 9 NA
I want to emphasize that I think Maurits' answer is the one to accept, as he gets the output exactly as expected.

Multiplication of matrices with NA values

If I have 2 square Matrices with random NA values, for example:
Matrix A:
1 2 3
1 5 NA 7
2 NA 3 8
3 NA 4 5
Matrix B:
1 2 3
1 NA 8 NA
2 2 5 9
3 NA 4 3
What is the best way to multiply them? Would changing NA values to 0 give a different result of the dot product?
NAs will be ignored:
## Dummy matrices
mat1 <- matrix(sample(1:9, 9), 3, 3)
mat2 <- matrix(sample(1:9, 9), 3, 3)
## Adding NAs
mat1[sample(1:9, 4)] <- NA
mat2[sample(1:9, 4)] <- NA
mat1
# [,1] [,2] [,3]
#[1,] 9 NA 3
#[2,] 2 NA NA
#[3,] NA 1 8
mat2
# [,1] [,2] [,3]
#[1,] NA NA 4
#[2,] NA 9 3
#[3,] NA 7 1
mat1 * mat2
# [,1] [,2] [,3]
#[1,] NA NA 12
#[2,] NA NA NA
#[3,] NA 7 8
mat1 %*% mat2
# [,1] [,2] [,3]
#[1,] NA NA NA
#[2,] NA NA NA
#[3,] NA NA NA
In this case the dot product results in only NAs because there are no operations that does not involve an NA. Different matrices can lead to different results.

How to replace non diagonal elements of matrix by row?

I would like to replace non diagonal elements of matrix with a
sequence of numbers.
I managed to write this:
mat[outer(1:nrows(mat), 1:nrows(mat), function(i,j) j!=i)] <- seq(1:182)
But it fills the number by column. I would not like to use the
transpose function as I have specific row name which I would like to
keep.
Example
So if I have a matrix m
m <- matrix(NA, nrow=5, ncol=5, dimnames=list(letters[1:5], NULL))
m
# [,1] [,2] [,3] [,4] [,5]
# a NA NA NA NA NA
# b NA NA NA NA NA
# c NA NA NA NA NA
# d NA NA NA NA NA
# e NA NA NA NA NA
How can I add a sequence to the non-diagonals while keeping the rownames of the original matrix: expected output
# [,1] [,2] [,3] [,4] [,5]
# a NA 1 2 3 4
# b 5 NA 6 7 8
# c 9 10 NA 11 12
# d 13 14 15 NA 16
# e 17 18 19 20 NA
We can try
mat[lower.tri(mat, diag=FALSE)|upper.tri(mat, diag=FALSE)] <- 1:182
Or
mat[!diag(ncol(mat))] <- 1:182
Using a small example in OP's post
m[!diag(ncol(m))] <- 1:20
out <- t(m)
dimnames(out) <- rev(dimnames(out))
Used rev from #user20650's comments

create new matrix with new dimension and omitting NA values

I have a matrix with some NA values
for example:
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 NA 8 11
[3,] 3 6 NA 12
I want to create new matrix with data from my matrix above with new dimension and no NA value. (it is ok to have NA only some last elements)
something like:
[,1] [,2] [,3]
[1,] 1 6 11
[2,] 2 7 12
[3,] 3 8 NA
[4,] 4 10 NA
I would appreciate if anyone can help me.
Thanks
Something like this as well:
m <- matrix(1:12, nc=4)
m[c(5, 9)] <- NA
matrix(c(na.omit(c(m)), rep(NA, sum(is.na(m)))), nrow=4)
m <- matrix(1:12, nc=4)
m[c(5, 9)] <- NA
# create an array of the appropriate class and dimension (filled with NA values)
dims <- c(4, 3)
md <- array(m[0], dim=dims)
# replace first "n" values with non-NA values from m
nonNAm <- na.omit(c(m))
md[seq_along(nonNAm)] <- nonNAm
md
# [,1] [,2] [,3]
# [1,] 1 6 11
# [2,] 2 7 12
# [3,] 3 8 NA
# [4,] 4 10 NA
Yet another attempt. This will keep the order of the values in column order as a matrix usually would. E.g.:
mat <- matrix(c(1,2,3,4,NA,6,7,8,NA,10,11,12),nrow=3)
array(mat[order(is.na(mat))],dim=dim(mat))
# [,1] [,2] [,3] [,4]
#[1,] 1 4 8 12
#[2,] 2 6 10 NA
#[3,] 3 7 11 NA
Now change a value to check it doesn't affect the ordering.
mat[7] <- 20
array(mat[order(is.na(mat))],dim=dim(mat))
# [,1] [,2] [,3] [,4]
#[1,] 1 4 8 12
#[2,] 2 6 10 NA
#[3,] 3 20 11 NA
You can then specify whatever dimensions you feel like to the dim= argument:
array(mat[order(is.na(mat))],dim=c(4,3))
# [,1] [,2] [,3]
#[1,] 1 6 11
#[2,] 2 20 12
#[3,] 3 8 NA
#[4,] 4 10 NA
This is fairly straightforward if you want to preserve order column-wise or row-wise.
originalMatrix <- matrix(c(1,2,3,4,NA,6,7,8,NA,10,11,12),nrow=3)
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 NA 8 11
[3,] 3 6 NA 12
newMatrixNums <- originalMatrix[!is.na(originalMatrix)]
[1] 1 2 3 4 6 7 8 10 11 12
Pad with NA:
newMatrixNums2 <- c(newMatrixNums,rep(NA,2))
Column-wise:
matrix(newMatrixNums2,nrow=3)
[,1] [,2] [,3] [,4]
[1,] 1 4 8 12
[2,] 2 6 10 NA
[3,] 3 7 11 NA
Row-wise:
matrix(newMatrixNums2,nrow=3,byrow=T)
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 6 7 8 10
[3,] 11 12 NA NA
Here's one way:
# Reproducing your data
m <- matrix(1:12, nc=4)
m[c(5, 9)] <- NA
# Your desired dimensions
dims <- c(4, 3)
array(c(na.omit(c(m)), rep(NA, prod(dims) - length(na.omit(c(m))))), dim=dims)
# [,1] [,2] [,3]
# [1,] 1 6 11
# [2,] 2 7 12
# [3,] 3 8 NA
# [4,] 4 10 NA
This can do the job but dunno whether it is a good way.
list1 <- m[m]
list2 <- m[!is.na(m)]
element1 <- list2
element2 <- rep(NA, (length(list1)-length(list2)))
newm <- matrix(c(element1,element2), nrow=4)
If you increase the length of a numeric vector with length(x)<- without assigning values to the new elements, the new values are given NA as their value. So length(M2) <- length(M) takes the shorter M2 vector and makes it the same length as M by adding NA values to the new elements.
## original
> (M <- matrix(c(1:4,NA,6:8,NA,10:12), nrow = 3))
# [,1] [,2] [,3] [,4]
# [1,] 1 4 7 10
# [2,] 2 NA 8 11
# [3,] 3 6 NA 12
## new
> M2 <- M[!is.na(M)]; length(M2) <- length(M)
> matrix(M2, ncol(M))
# [,1] [,2] [,3]
# [1,] 1 6 11
# [2,] 2 7 12
# [3,] 3 8 NA
# [4,] 4 10 NA

Resources