How to find position of missing values in a vector - r

What features does the R language have to find missing values in dataframe or at least, how to know that the dataframe has missing values?

x = matrix(rep(c(NA, 1,NA), 3), ncol=3, nrow=3)
print(x)
[,1] [,2] [,3]
[1,] NA NA NA
[2,] 1 1 1
[3,] NA NA NA
matrix of boolean values: is the value NA
is.na(x)
[,1] [,2] [,3]
[1,] TRUE TRUE TRUE
[2,] FALSE FALSE FALSE
[3,] TRUE TRUE TRUE
indices of NA values:
which(is.na(x), arr.ind = T)
row col
[1,] 1 1
[2,] 3 1
[3,] 1 2
[4,] 3 2
[5,] 1 3
[6,] 3 3
see if the matrix has any missing values:
any(is.na(x))
TRUE

It's hard to tell based on the example you've given, more details on the structure of "data" would be helpful, but, if you simply want to exclude any observation (row) of your data that has a missing value anywhere in it, try:
cleanDat <- na.omit(data)
Note, there is a nice tutorial on missing data which is where I looked to confirm I had this right.

Related

Count number of items within lists within matrix R

I have two a matrix where some of the cells within the matrices are NA and others are filled with a list of numbers. And what I need is a way to calculate the number of items within each list for each cell of the matrix.
Here is the matrix:
> matrix_1
[,1] [,2]
[1,] NA c(1001, 1002)
[2,] c(1001, 1003) NA
Here is what I am looking for:
[,1] [,2]
[1,] NA 2
[2,] 2 NA
The actual data set is much, much larger - so I am trying to avoid loops.
Here is the dput:
Matrix 1 = structure(list(NA, c(1001, 1003), c(1001, 1002), NA), .Dim = c(2L,
2L))
You could decide to do:
NA^is.na(matrix1) * lengths(matrix1)
[,1] [,2]
[1,] NA 2
[2,] 2 NA
or even:
`is.na<-`(lengths(matrix1), is.na(matrix1))
[,1] [,2]
[1,] NA 2
[2,] 2 NA
Maybe you can try lengths + replace like below
> replace(lengths(matrix_1),which(is.na(matrix_1)),NA)
[,1] [,2]
[1,] NA 2
[2,] 2 NA
It seems that your description of the question and the expected output are slightly different.
The number of items in a list element conaining a single NA is 1, not NA. So the answer to this is:
matrix1=matrix(list(NA,c(1001,1003),c(1001,1002),NA),nrow=2)
answer=array(lengths(matrix1),dim=dim(matrix1))
answer
# [,1] [,2]
# [1,] 1 2
# [2,] 2 1
However, if you want to convert all the elements corresponding a single NA entry to be NA themselves (in agreement with your expected output), you can do the extra step:
answer[is.na(matrix1)]=NA
answer
# [,1] [,2]
# [1,] NA 2
# [2,] 2 NA
Note that elements of more-than-one item, of which some are NA won't be detected by this last step... (you'd need to use answer[sapply(matrix1,function(x) any(is.na(x)))]=NA instead).

R correlation coefficient of columns for each row of matrix

trying to get the correlation coeffizient between the columns of each row of the matrix. I am reall new to R and it is a real beginner thing here. One of the first tasks I have to do for class.
Matrix:
A2
[,1] [,2]
[1,] 4 -2
[2,] 8 -3
[3,] 6 1
[4,] 2 2
[5,] -1 1
I tried to use cor(A) since I read it will automatically calculate the correlation coeffizient for columns of each row, but it gives me the following result:
cor(A2)
[,1] [,2]
[1,] 1.0000000 -0.6338878
[2,] -0.6338878 1.0000000
when using cor(t(A2))
cor(t(A2))
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 1 NA -1
[2,] 1 1 1 NA -1
[3,] 1 1 1 NA -1
[4,] NA NA NA 1 NA
[5,] -1 -1 -1 NA 1
But I expected it to have 5 rows, one column with the result in it.
There are several ways to use the cor() function. If you want to calculate the correlation between two columns in a matrix, then you can provide two arguments like this:
> cor(A2[,1], A2[,2])
[1] -0.6338878
If you input a single matrix as an argument, then it will return a correlation matrix.
> cor(A2)
[,1] [,2]
[1,] 1.0000000 -0.6338878
[2,] -0.6338878 1.0000000
In this case, position [1,1] is the correlation between the A2[,1] and A2[,1] (which is exactly 1). In the position [1,2], you can find the correlation between A2[,1] and A2[,2]. The correlation matrix is symmetric, and the diagnonal is always 1, because the correlation of a vector with itself is 1.

Indexing matrices when some elements of the selector are missing (R)

When some elements of a vector used for row-indexing a matrix or a data.frame are missing NA in R, the indexing operation has results that I find unexpected.
m = matrix(1:15,ncol = 3)
m[1,1] = NA
m[m[,1] < 4 ,]
Gives
[,1] [,2] [,3]
[1,] NA NA NA
[2,] 2 7 12
[3,] 3 8 13
While I would have expected
[,1] [,2] [,3]
[1,] NA 4 11
[2,] 2 7 12
[3,] 3 8 13
One option seems to be
m[m[,1] < 4 | is.na(m[,1]) ,]
But I find this unhandy. It often happens to me that I lose data by mistake when indexing matrices and data.frames that contains missings. Is there an easier and safer way to reach the desired result?

Preserve structure, when indexing a matrix with another matrix in R

Dear StackOverflowers,
I have an integer matrix in R and I would like to subset it so that I remove 1 specified cell in each column. So that, for instance, a 4x3 matrix becomes a 3x3 matrix. I have tried doing it by creating the second logical matrix of the same dimensions.
(subject.matrix <- matrix(1:12, nrow = 4))
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
(query.matrix <- matrix(c(T, T, F, T, T, F, T, T, T, T, T, F), nrow = 4))
[,1] [,2] [,3]
[1,] TRUE TRUE TRUE
[2,] TRUE FALSE TRUE
[3,] FALSE TRUE TRUE
[4,] TRUE TRUE FALSE
The problem is that, when I index the first matrix by the second one, it is simplified to an integer vector.
subject.matrix[query.matrix]
[1] 1 2 4 5 7 8 9 10 11
I've tried adding drop=F, but to no avail. I know, I can just wrap the resulting vector into a 3x3 matrix. So the expected outcome would be:
matrix(subject.matrix[query.matrix], nrow = 3)
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 7 10
[3,] 4 8 11
But I wonder if there's a more elegant/direct solution. I'm also not attached to using a logical matrix as the index, if that means a simpler solution. Perhaps, I could subset it with a vector of indices for the rows to be removed in each column, which in this case would translate into c(3, 2, 4).
Many thanks!
Edit based on #LyzandeR suggestion: My final goal was to take column sums of the resulting matrix. So replacing the redundant values with NA's seems to be the best way to go.
I think that the only way you can preserve the matrix structure would be to use a more general way of your question edit i.e.:
matrix(subject.matrix[query.matrix], ncol = ncol(subject.matrix))
You could even convert it into a function if you plan on using it multiple times:
subset.mat <- function(mat, index, cols=ncol(mat)) {
matrix(mat[index], ncol = cols)
}
Output:
> subset.mat(subject.matrix, query.matrix)
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 7 10
[3,] 4 8 11
Also (sorry just read your updated comment) you might consider using NAs in the matrix instead of subsetting them out, which will allow you to calculate the column sums as you say:
subject.matrix[!query.matrix] <- NA
subject.matrix
# [,1] [,2] [,3]
#[1,] 1 5 9
#[2,] 2 NA 10
#[3,] NA 7 11
#[4,] 4 8 NA
This is a little brute-forceish, but I think you'll be able to extrapolate it into something more general:
new.matrix = matrix(ncol = ncol(subject.matrix), nrow = nrow(subject.matrix) - 1)
for(i in 1:ncol(subject.matrix)){
new.matrix[,i] = subject.matrix[,i][query.matrix[,i] == TRUE]
}
new.matrix
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 7 10
[3,] 4 8 11
Essentially, I just initialized an empty matrix, and then iterated through each column of subject.matrix taking only the TRUE values for query.matrix.

Get elements over opposite diagonal in a matrix in R

I am trying to solve a little problem with a matrix in R. I have the next matrix in R (alfa):
alfa <- matrix(1:9,nrow=3)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
The opposite diagonal of alfa is filled of zeros. I would like to get in a new matrix all elements over this opposite diagonal (maybe the upper triangle over this diagonal). I wish to get a new matrix like this:
[,1] [,2] [,3]
[1,] 1 4 0
[2,] 2 0 0
[3,] 0 0 0
Or like this matrix with NA:
[,1] [,2] [,3]
[1,] 1 4 0
[2,] 2 0 NA
[3,] 0 NA NA
Where the elements located down the opposite diagonal of alfa are zero or NA, as you can see. I have tried with code using row(alfa) and col(alfa) but I can't get the expected matrix, for example:
(row(alfa)+col(alfa)-1)%%ncol(alfa)!=0
And I got this result where both upper and down elements over opposite diagonal are TRUE:
[,1] [,2] [,3]
[1,] TRUE TRUE FALSE
[2,] TRUE FALSE TRUE
[3,] FALSE TRUE TRUE
But I only want the upper elements, and the rest elements should be filled with zero or NA.
Many thanks for your help.
lower.tri almost does what you want, but you need to reverse the rows.
alfa[apply(lower.tri(alfa), 1, rev)] <- NA
Here, the matrix of the lower anti-diagonal is built, and used to select into alfa (vector indexing) for replacement.
lower.tri has a diag argument, which will also select the diagonal if set to TRUE.
f <- function(mat, diag = 0, offdiag = NA){
rev_vec <- seq(ncol(mat), 1)
j <- mat[,rev_vec]
j[lower.tri(j)] <- offdiag
diag(j) <- diag
j[,rev_vec]
}
You can specify if you want the off-diagonals to be NA or 0 by changing the offdiag parameter.

Resources