Remove rows in R matrix where all data is NA [duplicate] - r

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Removing empty rows of a data file in R
How would I remove rows from a matrix or data frame where all elements in the row are NA?
So to get from this:
[,1] [,2] [,3]
[1,] 1 6 11
[2,] NA NA NA
[3,] 3 8 13
[4,] 4 NA NA
[5,] 5 10 NA
to this:
[,1] [,2] [,3]
[1,] 1 6 11
[2,] 3 8 13
[3,] 4 NA NA
[4,] 5 10 NA
Because the problem with na.omit is that it removes rows with any NAs and so would give me this:
[,1] [,2] [,3]
[1,] 1 6 11
[2,] 3 8 13
The best I have been able to do so far is use the apply() function:
> x[apply(x, 1, function(y) !all(is.na(y))),]
[,1] [,2] [,3]
[1,] 1 6 11
[2,] 3 8 13
[3,] 4 NA NA
[4,] 5 10 NA
but this seems quite convoluted (is there something simpler that I am missing?)....
Thanks.

Solutions using rowSums() generally outperform apply() ones:
m <- structure(c( 1, NA, 3, 4, 5,
6, NA, 8, NA, 10,
11, NA, 13, NA, NA),
.Dim = c(5L, 3L))
m[rowSums(is.na(m)) != ncol(m), ]
[,1] [,2] [,3]
[1,] 1 6 11
[2,] 3 8 13
[3,] 4 NA NA
[4,] 5 10 NA

Sweep a test for all(is.na()) across rows, and remove where true. Something like this (untested as you provided no code to generate your data -- dput() is your friend):
R> ind <- apply(X, 1, function(x) all(is.na(x)))
R> X <- X[ !ind, ]

Related

Get rid of consecutive na in columns of a matirx

I asked the same question here but it was closed as my post has been associated with similar questions although they are not related to my question and don't resolve it.
The dataset:
I have a huge data set saved in a matrix where the number of rows is more that one million with a dozen of columns.
The matrix looks like
data <- matrix(c(1, NA, 2, NA, 1, NA, NA, NA, 1, NA, 3, NA, 5, NA, NA, NA, 8, NA, 5, NA, 7, NA, NA, NA), ncol=3)
> data
[,1] [,2] [,3]
[1,] 1 1 8
[2,] NA NA NA
[3,] 2 3 5
[4,] NA NA NA
[5,] 1 5 7
[6,] NA NA NA
[7,] NA NA NA
[8,] NA NA NA
So if there is a missing value in certain column, then necessarily other columns will have missing values for the same row.
The question:
I would like to delete "efficiently" consecutive missing values if there are 3 or more in each column for all columns in the matrix. So I would like to delete consecutive na in a column not a row.
I already saw solutions, like this one, for my question but they were too slow for my huge data set. Do you have other suggestions which can achieve the objective efficiently? Additionally, the suggested answers (1 & 2) for my closed question are deleting if the missing values are consecutive in rows not columns.
EDIT:
Following to the comment below, the output must be like this:
[,1] [,2] [,3]
[1,] 1 1 8
[2,] NA NA NA
[3,] 2 3 5
[4,] NA NA NA
[5,] 1 5 7
EDIT:
> data
[,1] [,2] [,3] [,4]
[1,] 1 1 8 NA
[2,] NA NA NA NA
[3,] 2 3 5 NA
[4,] NA NA NA NA
[5,] 1 5 7 NA
[6,] NA NA NA NA
[7,] NA NA NA NA
[8,] NA NA NA NA
The expected output
[,1] [,2] [,3]
[1,] 1 1 8
[2,] NA NA NA
[3,] 2 3 5
[4,] NA NA NA
[5,] 1 5 7
If it is consecutive, then may be rle can be used
i1 <- rowSums(is.na(data)) > 0
# // or just forgot to update here
i1 <- is.na(data[,1])
data[!inverse.rle(within.list(rle(i1), {
values[values & lengths < 3] <- FALSE})),]
-output
# [,1] [,2] [,3]
#[1,] 1 1 8
#[2,] NA NA NA
#[3,] 2 3 5
#[4,] NA NA NA
#[5,] 1 5 7
Update
If we have a particular column with all NAs, then we can remove it first
data1 <- data[,colSums(!is.na(data)) != 0]
and now we apply the previous code on the selected column data
i1 <- is.na(data1[,1])
data1[!inverse.rle(within.list(rle(i1), {
values[values & lengths < 3] <- FALSE})),]
Or we may use rleid from data.table (which would be more efficient)
library(data.table)
data[as.data.table(data)[, .I[!(.N >=3 & is.na(V1))],
rleid(is.na(V1))]$V1,]
if there is a missing value in certain column, then necessarily other columns will have missing values for the same row.
I think this is very important information, we can take advantage of it and work only with any 1 column instead of complete dataset. Try :
vec <- data[, 1]
data[!with(rle(is.na(vec)), rep(values & lengths >= 3, lengths)), ]
# [,1] [,2] [,3]
#[1,] 1 1 8
#[2,] NA NA NA
#[3,] 2 3 5
#[4,] NA NA NA
#[5,] 1 5 7

Multiplication of matrices with NA values

If I have 2 square Matrices with random NA values, for example:
Matrix A:
1 2 3
1 5 NA 7
2 NA 3 8
3 NA 4 5
Matrix B:
1 2 3
1 NA 8 NA
2 2 5 9
3 NA 4 3
What is the best way to multiply them? Would changing NA values to 0 give a different result of the dot product?
NAs will be ignored:
## Dummy matrices
mat1 <- matrix(sample(1:9, 9), 3, 3)
mat2 <- matrix(sample(1:9, 9), 3, 3)
## Adding NAs
mat1[sample(1:9, 4)] <- NA
mat2[sample(1:9, 4)] <- NA
mat1
# [,1] [,2] [,3]
#[1,] 9 NA 3
#[2,] 2 NA NA
#[3,] NA 1 8
mat2
# [,1] [,2] [,3]
#[1,] NA NA 4
#[2,] NA 9 3
#[3,] NA 7 1
mat1 * mat2
# [,1] [,2] [,3]
#[1,] NA NA 12
#[2,] NA NA NA
#[3,] NA 7 8
mat1 %*% mat2
# [,1] [,2] [,3]
#[1,] NA NA NA
#[2,] NA NA NA
#[3,] NA NA NA
In this case the dot product results in only NAs because there are no operations that does not involve an NA. Different matrices can lead to different results.

Transforming NA to specific arrays of a matrix in R

I have a matrix of the form,
mat <- matrix(1:25, 5,5)
that looks like the following:
Now, I need to transform this matrix in the form as shown below:
That is, I want to keep all elements of row 2 and 4 as well as column 2 and 4 and replace all other values with NA. This a just a simple example to explain the problem. My actual matrix size is about 2000 X 2000. Any help would be much appreciated.
Your first and second matrices are a different in that the first one is filled as R would fill a matrix (i.e. column-major order) and the second is row-major.
Assuming that you meant to have identical matrices, your task can be addressed with simple matrix operations:
mat <- matrix(1:25, 5,5)
mat2 <- matrix(NA, 5,5)
mat2[c(2,4),] <- 1
mat2[,c(2,4)] <- 1
mat * mat2
[,1] [,2] [,3] [,4] [,5]
[1,] NA 6 NA 16 NA
[2,] 2 7 12 17 22
[3,] NA 8 NA 18 NA
[4,] 4 9 14 19 24
[5,] NA 10 NA 20 NA
If not, just transpose your initial matrix with t(mat) and follow the same approach as above.
mat = t(mat)
replace(x = mat, which((matrix(row(mat) %in% c(2, 4), NROW(mat), NCOL(mat)) |
matrix(col(mat) %in% c(2, 4), NROW(mat), NCOL(mat))) == FALSE,
arr.ind = TRUE), NA)
# [,1] [,2] [,3] [,4] [,5]
#[1,] NA 2 NA 4 NA
#[2,] 6 7 8 9 10
#[3,] NA 12 NA 14 NA
#[4,] 16 17 18 19 20
#[5,] NA 22 NA 24 NA

create new matrix with new dimension and omitting NA values

I have a matrix with some NA values
for example:
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 NA 8 11
[3,] 3 6 NA 12
I want to create new matrix with data from my matrix above with new dimension and no NA value. (it is ok to have NA only some last elements)
something like:
[,1] [,2] [,3]
[1,] 1 6 11
[2,] 2 7 12
[3,] 3 8 NA
[4,] 4 10 NA
I would appreciate if anyone can help me.
Thanks
Something like this as well:
m <- matrix(1:12, nc=4)
m[c(5, 9)] <- NA
matrix(c(na.omit(c(m)), rep(NA, sum(is.na(m)))), nrow=4)
m <- matrix(1:12, nc=4)
m[c(5, 9)] <- NA
# create an array of the appropriate class and dimension (filled with NA values)
dims <- c(4, 3)
md <- array(m[0], dim=dims)
# replace first "n" values with non-NA values from m
nonNAm <- na.omit(c(m))
md[seq_along(nonNAm)] <- nonNAm
md
# [,1] [,2] [,3]
# [1,] 1 6 11
# [2,] 2 7 12
# [3,] 3 8 NA
# [4,] 4 10 NA
Yet another attempt. This will keep the order of the values in column order as a matrix usually would. E.g.:
mat <- matrix(c(1,2,3,4,NA,6,7,8,NA,10,11,12),nrow=3)
array(mat[order(is.na(mat))],dim=dim(mat))
# [,1] [,2] [,3] [,4]
#[1,] 1 4 8 12
#[2,] 2 6 10 NA
#[3,] 3 7 11 NA
Now change a value to check it doesn't affect the ordering.
mat[7] <- 20
array(mat[order(is.na(mat))],dim=dim(mat))
# [,1] [,2] [,3] [,4]
#[1,] 1 4 8 12
#[2,] 2 6 10 NA
#[3,] 3 20 11 NA
You can then specify whatever dimensions you feel like to the dim= argument:
array(mat[order(is.na(mat))],dim=c(4,3))
# [,1] [,2] [,3]
#[1,] 1 6 11
#[2,] 2 20 12
#[3,] 3 8 NA
#[4,] 4 10 NA
This is fairly straightforward if you want to preserve order column-wise or row-wise.
originalMatrix <- matrix(c(1,2,3,4,NA,6,7,8,NA,10,11,12),nrow=3)
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 NA 8 11
[3,] 3 6 NA 12
newMatrixNums <- originalMatrix[!is.na(originalMatrix)]
[1] 1 2 3 4 6 7 8 10 11 12
Pad with NA:
newMatrixNums2 <- c(newMatrixNums,rep(NA,2))
Column-wise:
matrix(newMatrixNums2,nrow=3)
[,1] [,2] [,3] [,4]
[1,] 1 4 8 12
[2,] 2 6 10 NA
[3,] 3 7 11 NA
Row-wise:
matrix(newMatrixNums2,nrow=3,byrow=T)
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 6 7 8 10
[3,] 11 12 NA NA
Here's one way:
# Reproducing your data
m <- matrix(1:12, nc=4)
m[c(5, 9)] <- NA
# Your desired dimensions
dims <- c(4, 3)
array(c(na.omit(c(m)), rep(NA, prod(dims) - length(na.omit(c(m))))), dim=dims)
# [,1] [,2] [,3]
# [1,] 1 6 11
# [2,] 2 7 12
# [3,] 3 8 NA
# [4,] 4 10 NA
This can do the job but dunno whether it is a good way.
list1 <- m[m]
list2 <- m[!is.na(m)]
element1 <- list2
element2 <- rep(NA, (length(list1)-length(list2)))
newm <- matrix(c(element1,element2), nrow=4)
If you increase the length of a numeric vector with length(x)<- without assigning values to the new elements, the new values are given NA as their value. So length(M2) <- length(M) takes the shorter M2 vector and makes it the same length as M by adding NA values to the new elements.
## original
> (M <- matrix(c(1:4,NA,6:8,NA,10:12), nrow = 3))
# [,1] [,2] [,3] [,4]
# [1,] 1 4 7 10
# [2,] 2 NA 8 11
# [3,] 3 6 NA 12
## new
> M2 <- M[!is.na(M)]; length(M2) <- length(M)
> matrix(M2, ncol(M))
# [,1] [,2] [,3]
# [1,] 1 6 11
# [2,] 2 7 12
# [3,] 3 8 NA
# [4,] 4 10 NA

R Create Matrix From an Operation on a "Row" Vector and a "Column" Vector

First create a "row" vector and a "column" vector in R:
> row.vector <- seq(from = 1, length = 4, by = 1)
> col.vector <- {t(seq(from = 1, length = 3, by = 2))}
From that I'd like to create a matrix by, e.g., multiplying each value in the row vector with each value in the column vector, thus creating from just those two vectors:
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 6 10
[3,] 3 9 15
[4,] 4 12 20
Can this be done with somehow using apply()? sweep()? ...a for loop?
Thank you for any help!
Simple matrix multiplication will work just fine
row.vector %*% col.vector
# [,1] [,2] [,3]
# [1,] 1 3 5
# [2,] 2 6 10
# [3,] 3 9 15
# [4,] 4 12 20
You'd be better off working with two actual vectors, instead of a vector and a matrix:
outer(row.vector,as.vector(col.vector))
# [,1] [,2] [,3]
#[1,] 1 3 5
#[2,] 2 6 10
#[3,] 3 9 15
#[4,] 4 12 20
Here's a way to get there with apply. Is there a reason why you're not using matrix?
> apply(col.vector, 2, function(x) row.vector * x)
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 6 10
## [3,] 3 9 15
## [4,] 4 12 20

Resources