Separating and comparing complete cases and NA elements in a vector - r

I have a vector with numerical and NA elements.
For example,
data<-c(.4, -1, 1, NA, 8, NA, -.4)
data[complete.cases(data), ]
But what's the function to separate them into different vectors so I can compare them using graphs such as a boxplot and ECDF?

It's not clear what problem you are trying to solve. complete.cases creates a logical vector for selection (if you use it correctly.) You can negate it to get the other ones. You cannot address a vector as you attempted with [ , ] but if 'data' were a dataframe (or a matrix) that would have worked.
data<-c(.4, -1, 1, NA, 8, NA, -.4)
data[complete.cases(data) ]
#[1] 0.4 -1.0 1.0 8.0 -0.4
data[!complete.cases(data) ]
#[1] NA NA
If one is trying to select the non-NA items it might be easier to use !is.na(data) as the selection vector. This is a test case showing it works with matrices as well as data.frames:
> dat <- matrix( sample(c(1,2,NA), 12, rep=TRUE), 3)
> dat
[,1] [,2] [,3] [,4]
[1,] 1 1 1 1
[2,] NA NA 2 2
[3,] 1 NA 2 1
> dat[ complete.cases(dat), ]
[1] 1 1 1 1
> dat[ ! complete.cases(dat), ]
[,1] [,2] [,3] [,4]
[1,] NA NA 2 2
[2,] 1 NA 2 1

Related

Count number of items within lists within matrix R

I have two a matrix where some of the cells within the matrices are NA and others are filled with a list of numbers. And what I need is a way to calculate the number of items within each list for each cell of the matrix.
Here is the matrix:
> matrix_1
[,1] [,2]
[1,] NA c(1001, 1002)
[2,] c(1001, 1003) NA
Here is what I am looking for:
[,1] [,2]
[1,] NA 2
[2,] 2 NA
The actual data set is much, much larger - so I am trying to avoid loops.
Here is the dput:
Matrix 1 = structure(list(NA, c(1001, 1003), c(1001, 1002), NA), .Dim = c(2L,
2L))
You could decide to do:
NA^is.na(matrix1) * lengths(matrix1)
[,1] [,2]
[1,] NA 2
[2,] 2 NA
or even:
`is.na<-`(lengths(matrix1), is.na(matrix1))
[,1] [,2]
[1,] NA 2
[2,] 2 NA
Maybe you can try lengths + replace like below
> replace(lengths(matrix_1),which(is.na(matrix_1)),NA)
[,1] [,2]
[1,] NA 2
[2,] 2 NA
It seems that your description of the question and the expected output are slightly different.
The number of items in a list element conaining a single NA is 1, not NA. So the answer to this is:
matrix1=matrix(list(NA,c(1001,1003),c(1001,1002),NA),nrow=2)
answer=array(lengths(matrix1),dim=dim(matrix1))
answer
# [,1] [,2]
# [1,] 1 2
# [2,] 2 1
However, if you want to convert all the elements corresponding a single NA entry to be NA themselves (in agreement with your expected output), you can do the extra step:
answer[is.na(matrix1)]=NA
answer
# [,1] [,2]
# [1,] NA 2
# [2,] 2 NA
Note that elements of more-than-one item, of which some are NA won't be detected by this last step... (you'd need to use answer[sapply(matrix1,function(x) any(is.na(x)))]=NA instead).

Modifying empty matrix given matrix of positions and vector of values

Given the following setup:
> vals = matrix(nrow = 3,ncol = 4)
[,1] [,2] [,3] [,4]
[1,] NA NA NA NA
[2,] NA NA NA NA
[3,] NA NA NA NA
> position = matrix(c(4,2,1, 6,3,2, NA,NA,3, NA,NA,4), nrow = 3, ncol = 4)
[,1] [,2] [,3] [,4]
[1,] 1 4 NA NA
[2,] 2 3 NA NA
[3,] 1 2 3 4
> temp = c(10, 5, 8, 6, 9, 2, 4, 3)
I'm trying to populate vals with the values held in temp. However, the values must be placed in the spots given by position. Specifically, each row in position represents a row in vals, and the values represent the column in which the value must be placed.
For example, position[2,2] = 3. Since that's position's second row, the respective value must go into vals[2,3]. The final result would be:
[,1] [,2] [,3] [,4]
[1,] 10 NA NA 5
[2,] NA 8 6 NA
[3,] 9 2 4 3
This would be straightforward with for-loops, but can it be done without them?
We can use a row/column indexing by cbinding the row index (created with row, c -> convert the numeric index matrix to vector), with the column index by transposing the 'position', coerce it to vector (c), remove the NA elements (na.omit), extract the elements in 'vals' based on the indexes and assign (<-) to 'temp'
vals[na.omit(cbind(c(t(row(position))), c(t(position))))] <- temp
vals
# [,1] [,2] [,3] [,4]
#[1,] 10 NA NA 5
#[2,] NA 8 6 NA
#[3,] 9 2 4 3
data
position <- structure(c(1, 2, 1, 4, 3, 2, NA, NA, 3, NA, NA, 4), .Dim = 3:4)

Setting matrix values comparing to vector in R

I want to set NA's in every element of a matrix where the value in a column is greater than or equal to the value of a given vector. For example, I can create a matrix:
set.seed(1)
zz <- matrix(data = round(10L * runif(12)), nrow = 4, ncol = 3)
which gives for zz:
[,1] [,2] [,3]
[1,] 8 5 7
[2,] 6 5 1
[3,] 5 10 3
[4,] 9 1 9
and for the comparison vector (for example):
xx <- round(10L * runif(4))
where xx is:
[1] 6 3 8 2
if I perform this operation:
apply(zz,2,function(x) x >= xx)
I get:
[,1] [,2] [,3]
[1,] TRUE FALSE TRUE
[2,] TRUE TRUE FALSE
[3,] FALSE TRUE FALSE
[4,] TRUE FALSE TRUE
What I want is everywhere I have a TRUE element I want an NA and everywhere I have a FALSE I get the number in the zz matrix (e.g., manually ...):
NA 5 NA
NA NA 1
5 NA 3
NA 1 NA
I can cobble together some "for" loops to do what I want, but is there a vector-based way to do this??
Thanks for any tips.
You could simply do:
zz[zz>=xx] <- NA
# [,1] [,2] [,3]
#[1,] NA 5 NA
#[2,] NA NA 1
#[3,] 5 NA 3
#[4,] NA 1 NA
Here is one option to get the expected output. We get a logical matrix (zz >= xx), using NA^ on that returns NA for the TRUE values and 1 for the FALSE, then multiply it with original matrix 'zz' so that NA remains as such while the 1 changes to the corresponding value in 'zz'.
NA^(zz >= xx)*zz
# [,1] [,2] [,3]
#[1,] NA 5 NA
#[2,] NA NA 1
#[3,] 5 NA 3
#[4,] NA 1 NA
Or another option is ifelse
ifelse(zz >= xx, NA, zz)
data
zz <- structure(c(8, 6, 5, 9, 5, 5, 10, 1, 7, 1, 3, 9), .Dim = c(4L, 3L))
xx <- c(6, 3, 8, 2)

what does rbind.fill.matrix really do?

I have this code and can't understand how rbind.fill.matrix is used.
dtmat is a matrix with the documents on rows and words on columns.
word <- do.call(rbind.fill.matrix,lapply(1:ncol(dtmat), function(i) {
t(rep(1:length(dtmat[,i]), dtmat[,i]))
}))
I read the description of the function and says that binds matrices but cannot understand which ones and fills with NA missing columns.
From what I understand, the function replaces columns that dont bind with NA.
Lets say I have 2 matrices A with two columns col1 and col2, B with three columns col1, col2 and colA. Since I want to bind all both these matrices, but rbind only binds matrices with equal number of columns and same column names, rbind.fill.matrix binds the columns but adds NA to all values that should be in both the matrices that are not. The code below will explain it more clearly.
a <- matrix(c(1,1,2,2), nrow = 2, byrow = T)
> a
[,1] [,2]
[1,] 1 1
[2,] 2 2
>
> b <- matrix(c(1,1,1,2,2,2,3,3,3), nrow = 3, byrow = T)
> b
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 2 2 2
[3,] 3 3 3
>
> library(plyr)
> r <- rbind.fill.matrix(a,b)
> r
1 2 3
[1,] 1 1 NA
[2,] 2 2 NA
[3,] 1 1 1
[4,] 2 2 2
[5,] 3 3 3
>
>
The documentation also mentions about column names, which I think you can also understand from the example.

Consecutive NAs in a column

I'd like to remove the rows that got more than 3 consecutive NAs in one column.
[,1] [,2]
[1,] 1 1
[2,] NA 1
[3,] 2 4
[4,] NA 3
[6,] 1 4
[7,] NA 8
[8,] NA 5
[9,] NA 6
so I'd have this data
[,1] [,2]
[1,] 1 1
[2,] NA 1
[3,] 2 4
[4,] NA 3
[6,] 1 4
I did a research and I tried this code
data[! rowSums(is.na(data)) >3 , ]
but I think this is only used for consecutive NAs in a row.
As mentioned, rle is a good place to start:
is.na.rle <- rle(is.na(data[, 1]))
Since NAs are "bad" only when they come by three or more, we can re-write the values:
is.na.rle$values <- is.na.rle$values & is.na.rle$lengths >= 3
Finally, use inverse.rle to build the vector of indices to filter:
data[!inverse.rle(is.na.rle), ]
You could use rle, or you could do this:
library(data.table)
d = data.table(a = c(1,NA,2,NA,3,4,NA,NA,NA), b = c(1:9))
d[d[, if(.N > 3) {.I[1]} else {.I}, by = cumsum(!is.na(a))]$V1]
# a b
#1: 1 1
#2: NA 2
#3: 2 3
#4: NA 4
#5: 3 5
#6: 4 6
Run d[, cumsum(!is.na(a))] to see why this works. Also, I could've used .SD instead of .I to get cleaner code, but opted for efficiency instead.
As #DirkEddelbuettel suggested, the rle() function will help. You can create your own function to identify the elements of a vector with 3 or more consecutive NA values.
consecna <- function(x, n=3) {
# function to identify elements with n or more consecutive NA values
y <- rle(is.na(x))
y$values <- y$lengths > (n - 0.5) & y$values
inverse.rle(y)
}
Then you can apply this function to each column of your matrix.
# example matrix of data
m <- matrix(c(1, NA, 2, NA, 1, NA, NA, NA, 1, 1, 4, 3, 4, 8, 5, 6), ncol=2)
# index matrix identifying elements with 3 or more consecutive NA values
mindex <- apply(m, 2, consecna)
Then use the created index matrix to get rid of all those rows that were identified.
# removal of all the identified rows
m2 <- m[!apply(mindex, 1, any), ]

Resources