Remove NA/NaN/Inf in a matrix - r

I want to try two things :
How do I remove rows that contain NA/NaN/Inf
How do I set value of data point from NA/NaN/Inf to 0.
So far, I have tried using the following for NA values, but been getting warnings.
> eg <- data[rowSums(is.na(data)) == 0,]
Error in rowSums(is.na(data)) :
'x' must be an array of at least two dimensions
In addition: Warning message:
In is.na(data) : is.na() applied to non-(list or vector) of type 'closure'

I guess I'll throw my hat into the ring with my preferred methods:
# sample data
m <- matrix(c(1,2,NA,NaN,1,Inf,-1,1,9,3),5)
# remove all rows with non-finite values
m[!rowSums(!is.finite(m)),]
# replace all non-finite values with 0
m[!is.finite(m)] <- 0

library(functional)
m[apply(m, 1, Compose(is.finite, all)),]
Demonstration:
m <- matrix(c(1,2,3,NA,4,5), 3)
m
## [,1] [,2]
## [1,] 1 NA
## [2,] 2 4
## [3,] 3 5
m[apply(m, 1, Compose(is.finite, all)),]
## [,1] [,2]
## [1,] 2 4
## [2,] 3 5
Note: Compose(is.finite, all) is equivalent to function(x) all(is.finite(x))
To set the values to 0, use matrix indexing:
m[!is.finite(m)] <- 0
m
## [,1] [,2]
## [1,] 1 0
## [2,] 2 4
## [3,] 3 5

NaRV.omit(x) is my preferred option for question 1. Mnemonic NaRV means "not a regular value".
require(IDPmisc)
m <- matrix(c(1,2,3,NA,5, NaN, 7, 8, 9, Inf, 11, 12, -Inf, 14, 15), 5)
> m
[,1] [,2] [,3]
[1,] 1 NaN 11
[2,] 2 7 12
[3,] 3 8 -Inf
[4,] NA 9 14
[5,] 5 Inf 15
> NaRV.omit(m)
[,1] [,2] [,3]
[1,] 2 7 12
attr(,"na.action")
[1] 1 3 4 5
attr(,"class")
[1] "omit"

Just another way (for the first question):
m <- structure(c(1, 2, 3, NA, 4, 5, Inf, 5, 6, NaN, 7, 8),
.Dim = c(4L, 3L))
# [,1] [,2] [,3]
# [1,] 1 4 6
# [2,] 2 5 NaN
# [3,] 3 Inf 7
# [4,] NA 5 8
m[complete.cases(m * 0), , drop=FALSE]
# [,1] [,2] [,3]
# [1,] 1 4 6
I can't think anything else other than Matthew's answer for the second part.

Related

Solve linear equation system b=0

I found this code to resolve a linear equation system with b=0, but I would like to know why with the first matrix only one column is returned and with the second matrix two columns are returned.
library(MASS)
Null(t(A))
R > (A <- matrix(c(1,2,3,2,4,7), ncol = 3, byrow = TRUE))
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 4 7
R > Null(t(A))
[,1]
[1,] -8.944272e-01
[2,] 4.472136e-01
[3,] 7.771561e-16
R > (A <- matrix(c(1,2,3,2,4,6), ncol = 3, byrow = TRUE))
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 4 6
R > Null(t(A))
[,1] [,2]
[1,] -0.5345225 -0.8017837
[2,] 0.7745419 -0.3381871
[3,] -0.3381871 0.4927193
library(MASS)
A <- matrix(c(1,2,3,2,4,7), ncol = 3, byrow = T)
t(A)
#> [,1] [,2]
#> [1,] 1 2
#> [2,] 2 4
#> [3,] 3 7
B <- matrix(c(1,2,3,2,4,6), ncol = 3, byrow = T)
t(B)
#> [,1] [,2]
#> [1,] 1 2
#> [2,] 2 4
#> [3,] 3 6
From the above, you can see that in your last case, all the rows are linearly combination of one another. In your 1st case, 2 rows are linear combinations.
You have a rank of 2 vs 1 and thus answers of 2 vs 1.

Construct and inverse matrix from list using R

I have a relationship matrix generated from GCTA, that I can import into R using the following function
ReadGRMBin=function(prefix, AllN=F, size=4){
sum_i=function(i){
return(sum(1:i))
}
BinFileName=paste(prefix,".grm.bin",sep="")
NFileName=paste(prefix,".grm.N.bin",sep="")
IDFileName=paste(prefix,".grm.id",sep="")
id = read.table(IDFileName)
n=dim(id)[1]
BinFile=file(BinFileName, "rb");
grm=readBin(BinFile, n=n*(n+1)/2, what=numeric(0), size=size)
NFile=file(NFileName, "rb");
if(AllN==T){
N=readBin(NFile, n=n*(n+1)/2, what=numeric(0), size=size)
}
else N=readBin(NFile, n=1, what=numeric(0), size=size)
i=sapply(1:n, sum_i)
return(list(diag=grm[i], off=grm[-i], id=id, N=N))
}
It then lists the off diagonal and diagonal.
$ diag: num [1:850] 0.878 0.815 1.11 1.161 1.062 ...
$ off : num [1:360825] 0.0181 -0.0304 -0.0663 -0.0211 -0.0583 ...
$ n : int 850
I wish to create a grm I can inverse from this and ideally in the output row, column, value
I have tried the following code but it doesn't read the off diagonal in the correct format
m <- matrix(NA, ncol = length(grm$diag), nrow = length(grm$diag))
m[lower.tri(m)] <- grm$off
m[upper.tri(m)] <- t(m)[upper.tri(t(m))]
diag(m) <- grm$diag
m
want=cbind(which(!is.na(m),arr.ind = TRUE),na.omit(as.vector(m)))
Instead of reading the diagonal values as
2 1, 3 1, 3 2, 4 1, 4 2 etc
It is reading the diagonal going length wise as
2 1, 3 1, 4 1, 5 1, 6 1 etc
So the resulting matrix (shortened) ends up like this
[,1] [,2] [,3] [,4] [,5]
[1,] 0.87798703 0.018129893 -0.03044302 -0.066282429 -0.02106927
[2,] 0.01812989 0.814602911 0.07577287 -0.004078172 -0.03182918
[3,] -0.03044302 0.075772874 1.10976517 -0.055698857 -0.03960679
[4,] -0.06628243 -0.004078172 -0.05569886 1.160611629 -0.01021352
[5,] -0.02106927 -0.031829182 -0.03960679 -0.010213521 1.06245303
When preference is this
[,1] [,2] [,3] [,4] [,5]
[1,] 0.87798703 0.018129893 -0.03044302 -0.02106927 -0.04011643
[2,] 0.01812989 0.814602911 -0.06628243 -0.00582625 -0.06237402
[3,] -0.03044302 -0.06628243 1.10976517 0.1315616 -0.1601102
[4,] -0.02106927 -0.00582625 0.1315616 1.160611629 -0.1388046
[5,] -0.04011643 -0.06237402 -0.1601102 -0.1388046 1.06245303
If you know how to amend the above code to give the wanted format it would be much appreciated.
The end desired output would be the inverse of the matrix in long format if possible. Thanks
1 1 12456
1 2 78910
1 3 34568
1 4 68942
One simple solution is to adapt your code to fill the upper triangle before the lower (since it is the upper triangle that should be filled in column order):
grm = list(
diag = 1:5 * 11,
off = 0:9)
m <- diag(grm$diag)
m[upper.tri(m)] <- grm$off
m[lower.tri(m)] <- t(m)[lower.tri(t(m))]
# [,1] [,2] [,3] [,4] [,5]
# [1,] 11 0 1 3 6
# [2,] 0 22 2 4 7
# [3,] 1 2 33 5 8
# [4,] 3 4 5 44 9
# [5,] 6 7 8 9 55

understanding elements in a matrix

I'm trying to understand more about matrices in R and I stumbled upon this query. Could someone explain to me why x[2:3, 2:2] returns 5 and 6?
Am I right to say that, 2:3, 2:2, simply refers to row 2, column 2 and row 3 column 2?
> x <- matrix(c(1:6), ncol = 2)
> x
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
>
>
> x[2:3, 2:2]
[1] 5 6
If that's the case, why is it that having a 3x3 matrix returns such a strange value?
> x <- matrix(c(1:9), ncol = 3)
> x
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> x[1:2, 3:2, drop = F]
[,1] [,2]
[1,] 7 4
[2,] 8 5

Setting matrix values comparing to vector in R

I want to set NA's in every element of a matrix where the value in a column is greater than or equal to the value of a given vector. For example, I can create a matrix:
set.seed(1)
zz <- matrix(data = round(10L * runif(12)), nrow = 4, ncol = 3)
which gives for zz:
[,1] [,2] [,3]
[1,] 8 5 7
[2,] 6 5 1
[3,] 5 10 3
[4,] 9 1 9
and for the comparison vector (for example):
xx <- round(10L * runif(4))
where xx is:
[1] 6 3 8 2
if I perform this operation:
apply(zz,2,function(x) x >= xx)
I get:
[,1] [,2] [,3]
[1,] TRUE FALSE TRUE
[2,] TRUE TRUE FALSE
[3,] FALSE TRUE FALSE
[4,] TRUE FALSE TRUE
What I want is everywhere I have a TRUE element I want an NA and everywhere I have a FALSE I get the number in the zz matrix (e.g., manually ...):
NA 5 NA
NA NA 1
5 NA 3
NA 1 NA
I can cobble together some "for" loops to do what I want, but is there a vector-based way to do this??
Thanks for any tips.
You could simply do:
zz[zz>=xx] <- NA
# [,1] [,2] [,3]
#[1,] NA 5 NA
#[2,] NA NA 1
#[3,] 5 NA 3
#[4,] NA 1 NA
Here is one option to get the expected output. We get a logical matrix (zz >= xx), using NA^ on that returns NA for the TRUE values and 1 for the FALSE, then multiply it with original matrix 'zz' so that NA remains as such while the 1 changes to the corresponding value in 'zz'.
NA^(zz >= xx)*zz
# [,1] [,2] [,3]
#[1,] NA 5 NA
#[2,] NA NA 1
#[3,] 5 NA 3
#[4,] NA 1 NA
Or another option is ifelse
ifelse(zz >= xx, NA, zz)
data
zz <- structure(c(8, 6, 5, 9, 5, 5, 10, 1, 7, 1, 3, 9), .Dim = c(4L, 3L))
xx <- c(6, 3, 8, 2)

Filtering out rows in a matrix containing only 0 in R

I have the matrix
m <- matrix(c(1, 0, 3, 4, 0, 6), 3)
I need to filter out rows where both columns have a value of 0 in effect returning the matrix:
m <- matrix(c(1, 3, 4, 6), 3)
I have tried
m[m[, 1] > 0 & m[, 2] > 0]
but this returns a vector instead of a matrix stripped of rows with only 0. This should be simple but I am stuck.
Thanks,
-Elizabeth
In case you had many columns
m
[,1] [,2]
[1,] 1 4
[2,] 0 0
[3,] 3 6
m^2
[,1] [,2]
[1,] 1 16
[2,] 0 0
[3,] 9 36
rowSums(m^2)
[1] 17 0 45
m[rowSums(m^2)>0,]
[,1] [,2]
[1,] 1 4
[2,] 3 6
You are just missing a "," in your own solution.
Use
m[m[,1]>0 & m[,2]>0,]
and it will work:
> m[m[,1]>0 & m[,2]>0,]
[,1] [,2]
[1,] 1 4
[2,] 3 6

Resources