Recode NA when another column value is NA in R - r

I have a quick recoding question. Here is my sample dataset looks like:
df <- data.frame(id = c(1,2,3),
i1 = c(1,NA,0),
i2 = c(1,1,1))
> df
id i1 i2
1 1 1 1
2 2 NA 1
3 3 0 1
When, i1==NA , then I need to recode i2==NA. I tried below but not luck.
df %>%
mutate(i2 = case_when(
i1 == NA ~ NA_real_,
TRUE ~ as.character(i2)))
Error in `mutate()`:
! Problem while computing `i2 = case_when(i1 == "NA" ~ NA_real_, TRUE ~ as.character(i2))`.
Caused by error in `` names(message) <- `*vtmp*` ``:
! 'names' attribute [1] must be the same length as the vector [0]
my desired output looks like this:
> df
id i1 i2
1 1 1 1
2 2 NA NA
3 3 0 1

Would a simple assignment meet your requirements for this?
df$i2[is.na(df$i1)] <- NA

Here is an option:
t(apply(df, 1, \(x) if (any(is.na(x))) cumsum(x) else x))
# id i1 i2
#[1,] 1 1 1
#[2,] 2 NA NA
#[3,] 3 0 1
The idea is to calculate the cumulative sum of every row, if a row contains an NA; if there is an NA in term i , subsequent terms i+1 will also be NA (since e.g. NA + 1 = NA). Since your sample data df is all numeric, I recommend using a matrix (rather than a data.frame). Matrix operations are usually faster than data.frame (i.e. list) operations.
Key assumptions:
id cannot be NA.
This replaces NAs in i2 based on an NA in i1 per row.
A tidyverse solution
I advise against a tidyverse solution here for a couple of reasons
Your data is all-numerical, so a matrix is a more suitable data structure than a data.frame/tibble.
dplyr/tidyr syntax usually operates efficiently on columns; as soon as you want to do things "row-wise", dplyr (and its family packages) might not be the best way (despite dplyr::rowwise() which just introduces a row number-based grouping).
With that out of the way, you can transpose the problem.
library(tidyverse)
df %>%
transpose() %>%
map(~ { if (is.na(.x$i1)) .x$i2 <- NA_real_; .x }) %>%
transpose() %>%
as_tibble() %>%
unnest(everything())
## A tibble: 3 × 3
# id i1 i2
# <dbl> <dbl> <dbl>
#1 1 1 1
#2 2 NA NA
#3 3 0 1

Related

Remove columns that only contains NA NULL rows in R [duplicate]

I have a data frame where some of the columns contain NA values.
How can I remove columns where all rows contain NA values?
Try this:
df <- df[,colSums(is.na(df))<nrow(df)]
The two approaches offered thus far fail with large data sets as (amongst other memory issues) they create is.na(df), which will be an object the same size as df.
Here are two approaches that are more memory and time efficient
An approach using Filter
Filter(function(x)!all(is.na(x)), df)
and an approach using data.table (for general time and memory efficiency)
library(data.table)
DT <- as.data.table(df)
DT[,which(unlist(lapply(DT, function(x)!all(is.na(x))))),with=F]
examples using large data (30 columns, 1e6 rows)
big_data <- replicate(10, data.frame(rep(NA, 1e6), sample(c(1:8,NA),1e6,T), sample(250,1e6,T)),simplify=F)
bd <- do.call(data.frame,big_data)
names(bd) <- paste0('X',seq_len(30))
DT <- as.data.table(bd)
system.time({df1 <- bd[,colSums(is.na(bd) < nrow(bd))]})
# error -- can't allocate vector of size ...
system.time({df2 <- bd[, !apply(is.na(bd), 2, all)]})
# error -- can't allocate vector of size ...
system.time({df3 <- Filter(function(x)!all(is.na(x)), bd)})
## user system elapsed
## 0.26 0.03 0.29
system.time({DT1 <- DT[,which(unlist(lapply(DT, function(x)!all(is.na(x))))),with=F]})
## user system elapsed
## 0.14 0.03 0.18
Update
You can now use select with the where selection helper. select_if is superceded, but still functional as of dplyr 1.0.2. (thanks to #mcstrother for bringing this to attention).
library(dplyr)
temp <- data.frame(x = 1:5, y = c(1,2,NA,4, 5), z = rep(NA, 5))
not_all_na <- function(x) any(!is.na(x))
not_any_na <- function(x) all(!is.na(x))
> temp
x y z
1 1 1 NA
2 2 2 NA
3 3 NA NA
4 4 4 NA
5 5 5 NA
> temp %>% select(where(not_all_na))
x y
1 1 1
2 2 2
3 3 NA
4 4 4
5 5 5
> temp %>% select(where(not_any_na))
x
1 1
2 2
3 3
4 4
5 5
Old Answer
dplyr now has a select_if verb that may be helpful here:
> temp
x y z
1 1 1 NA
2 2 2 NA
3 3 NA NA
4 4 4 NA
5 5 5 NA
> temp %>% select_if(not_all_na)
x y
1 1 1
2 2 2
3 3 NA
4 4 4
5 5 5
> temp %>% select_if(not_any_na)
x
1 1
2 2
3 3
4 4
5 5
Late to the game but you can also use the janitor package. This function will remove columns which are all NA, and can be changed to remove rows that are all NA as well.
df <- janitor::remove_empty(df, which = "cols")
Another way would be to use the apply() function.
If you have the data.frame
df <- data.frame (var1 = c(1:7,NA),
var2 = c(1,2,1,3,4,NA,NA,9),
var3 = c(NA)
)
then you can use apply() to see which columns fulfill your condition and so you can simply do the same subsetting as in the answer by Musa, only with an apply approach.
> !apply (is.na(df), 2, all)
var1 var2 var3
TRUE TRUE FALSE
> df[, !apply(is.na(df), 2, all)]
var1 var2
1 1 1
2 2 2
3 3 1
4 4 3
5 5 4
6 6 NA
7 7 NA
8 NA 9
Another options with purrr package:
library(dplyr)
df <- data.frame(a = NA,
b = seq(1:5),
c = c(rep(1, 4), NA))
df %>% purrr::discard(~all(is.na(.)))
df %>% purrr::keep(~!all(is.na(.)))
df[sapply(df, function(x) all(is.na(x)))] <- NULL
An old question, but I think we can update #mnel's nice answer with a simpler data.table solution:
DT[, .SD, .SDcols = \(x) !all(is.na(x))]
(I'm using the new \(x) lambda function syntax available in R>=4.1, but really the key thing is to pass the logical subsetting through .SDcols.
Speed is equivalent.
microbenchmark::microbenchmark(
which_unlist = DT[, which(unlist(lapply(DT, \(x) !all(is.na(x))))), with=FALSE],
sdcols = DT[, .SD, .SDcols = \(x) !all(is.na(x))],
times = 2
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval cld
#> which_unlist 51.32227 51.32227 56.78501 56.78501 62.24776 62.24776 2 a
#> sdcols 43.14361 43.14361 49.33491 49.33491 55.52621 55.52621 2 a
You can use Janitor package remove_empty
library(janitor)
df %>%
remove_empty(c("rows", "cols")) #select either row or cols or both
Also, Another dplyr approach
library(dplyr)
df %>% select_if(~all(!is.na(.)))
OR
df %>% select_if(colSums(!is.na(.)) == nrow(df))
this is also useful if you want to only exclude / keep column with certain number of missing values e.g.
df %>% select_if(colSums(!is.na(.))>500)
I hope this may also help. It could be made into a single command, but I found it easier for me to read by dividing it in two commands. I made a function with the following instruction and worked lightning fast.
naColsRemoval = function (DataTable) {
na.cols = DataTable [ , .( which ( apply ( is.na ( .SD ) , 2 , all ) ) )]
DataTable [ , unlist (na.cols) := NULL , with = F]
}
.SD will allow to limit the verification to part of the table, if you wish, but it will take the whole table as
A handy base R option could be colMeans():
df[, colMeans(is.na(df)) != 1]
janitor::remove_constant() does this very nicely.
From my experience of having trouble applying previous answers, I have found that I needed to modify their approach in order to achieve what the question here is:
How to get rid of columns where for ALL rows the value is NA?
First note that my solution will only work if you do not have duplicate columns (that issue is dealt with here (on stack overflow)
Second, it uses dplyr.
Instead of
df <- df %>% select_if(~all(!is.na(.)))
I find that what works is
df <- df %>% select_if(~!all(is.na(.)))
The point is that the "not" symbol "!" needs to be on the outside of the universal quantifier. I.e. the select_if operator acts on columns. In this case, it selects only those that do not satisfy the criterion
every element is equal to "NA"

Remove rows that contain at least an NA only if one column contains a specific value

I have the following data frame:
a b c
x 1 1
x 1 NA
y NA 1
y 1 1
I would like to remove the rows containing at least an NA in any column(s), but only if the "a" column contains a "y". So, the result would be:
a b c
x 1 1
x 1 NA
y 1 1
So far I have tried this:
my_DF %>%
filter(!(any(is.na(.)) & a == "y"))
but the resulting data frame is the following:
a b c
x 1 1
x 1 NA
so this just removes any row in which "a" contains a "y", regardless of whether the row also contains NAs in at least one column.
How could I change the "any(is.na(.))" part of the command (I guess that is the wrong part) in other for it to work?
You can use the new if_any approach, introduced in dplyr 1.0.4, and used within the filter function. The following code will achieve the result you're after:
my_DF %>%
filter(!(a == "y" & if_any(everything(), ~ is.na(.x))))
Explanation of individual bits
filter - keep all rows where
! - it's not true that
everything() - check all columns (alternatively, you could specify a vector of column names, e.g. c("b", "c"))
if_any(everything(), ~ is.na(.x)) - if any column has NA (there is also an if_all version)
Full reproducible example
my_DF <- data.frame(a = c("x", "x", "y", "y"),
b = c(1, 1, NA, 1),
c = c(1, NA, 1, 1))
my_DF %>%
filter(!(a == "y" & if_any(everything(), ~ is.na(.x))))
You can do:
my_DF <- read.table(header=TRUE, text=
"a b c
x 1 1
x 1 NA
y NA 1
y 1 1")
i <- apply(is.na(my_DF), 1, any) & my_DF$a=="y"
my_DF[!i,]
You can do it in data.table
library(data.table)
setDT(my_df)
my_df <- my_df[!complete_cases(my_df),][a == 'y',]
This thread answers your question I guess - How to filter rows out of data.table where any column is NA without specifying columns individually
In base R you may do
my_DF <- read.table(header=TRUE, text=
"a b c
x 1 1
x 1 NA
y NA 1
y 1 1")
my_DF[rowSums(is.na(my_DF)) == 0 | my_DF$a == 'x',]
#> a b c
#> 1 x 1 1
#> 2 x 1 NA
#> 4 y 1 1
Created on 2021-06-29 by the reprex package (v2.0.0)

Remove specific columns from data frame in r using for loop [duplicate]

I have a data frame where some of the columns contain NA values.
How can I remove columns where all rows contain NA values?
Try this:
df <- df[,colSums(is.na(df))<nrow(df)]
The two approaches offered thus far fail with large data sets as (amongst other memory issues) they create is.na(df), which will be an object the same size as df.
Here are two approaches that are more memory and time efficient
An approach using Filter
Filter(function(x)!all(is.na(x)), df)
and an approach using data.table (for general time and memory efficiency)
library(data.table)
DT <- as.data.table(df)
DT[,which(unlist(lapply(DT, function(x)!all(is.na(x))))),with=F]
examples using large data (30 columns, 1e6 rows)
big_data <- replicate(10, data.frame(rep(NA, 1e6), sample(c(1:8,NA),1e6,T), sample(250,1e6,T)),simplify=F)
bd <- do.call(data.frame,big_data)
names(bd) <- paste0('X',seq_len(30))
DT <- as.data.table(bd)
system.time({df1 <- bd[,colSums(is.na(bd) < nrow(bd))]})
# error -- can't allocate vector of size ...
system.time({df2 <- bd[, !apply(is.na(bd), 2, all)]})
# error -- can't allocate vector of size ...
system.time({df3 <- Filter(function(x)!all(is.na(x)), bd)})
## user system elapsed
## 0.26 0.03 0.29
system.time({DT1 <- DT[,which(unlist(lapply(DT, function(x)!all(is.na(x))))),with=F]})
## user system elapsed
## 0.14 0.03 0.18
Update
You can now use select with the where selection helper. select_if is superceded, but still functional as of dplyr 1.0.2. (thanks to #mcstrother for bringing this to attention).
library(dplyr)
temp <- data.frame(x = 1:5, y = c(1,2,NA,4, 5), z = rep(NA, 5))
not_all_na <- function(x) any(!is.na(x))
not_any_na <- function(x) all(!is.na(x))
> temp
x y z
1 1 1 NA
2 2 2 NA
3 3 NA NA
4 4 4 NA
5 5 5 NA
> temp %>% select(where(not_all_na))
x y
1 1 1
2 2 2
3 3 NA
4 4 4
5 5 5
> temp %>% select(where(not_any_na))
x
1 1
2 2
3 3
4 4
5 5
Old Answer
dplyr now has a select_if verb that may be helpful here:
> temp
x y z
1 1 1 NA
2 2 2 NA
3 3 NA NA
4 4 4 NA
5 5 5 NA
> temp %>% select_if(not_all_na)
x y
1 1 1
2 2 2
3 3 NA
4 4 4
5 5 5
> temp %>% select_if(not_any_na)
x
1 1
2 2
3 3
4 4
5 5
Late to the game but you can also use the janitor package. This function will remove columns which are all NA, and can be changed to remove rows that are all NA as well.
df <- janitor::remove_empty(df, which = "cols")
Another way would be to use the apply() function.
If you have the data.frame
df <- data.frame (var1 = c(1:7,NA),
var2 = c(1,2,1,3,4,NA,NA,9),
var3 = c(NA)
)
then you can use apply() to see which columns fulfill your condition and so you can simply do the same subsetting as in the answer by Musa, only with an apply approach.
> !apply (is.na(df), 2, all)
var1 var2 var3
TRUE TRUE FALSE
> df[, !apply(is.na(df), 2, all)]
var1 var2
1 1 1
2 2 2
3 3 1
4 4 3
5 5 4
6 6 NA
7 7 NA
8 NA 9
Another options with purrr package:
library(dplyr)
df <- data.frame(a = NA,
b = seq(1:5),
c = c(rep(1, 4), NA))
df %>% purrr::discard(~all(is.na(.)))
df %>% purrr::keep(~!all(is.na(.)))
df[sapply(df, function(x) all(is.na(x)))] <- NULL
An old question, but I think we can update #mnel's nice answer with a simpler data.table solution:
DT[, .SD, .SDcols = \(x) !all(is.na(x))]
(I'm using the new \(x) lambda function syntax available in R>=4.1, but really the key thing is to pass the logical subsetting through .SDcols.
Speed is equivalent.
microbenchmark::microbenchmark(
which_unlist = DT[, which(unlist(lapply(DT, \(x) !all(is.na(x))))), with=FALSE],
sdcols = DT[, .SD, .SDcols = \(x) !all(is.na(x))],
times = 2
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval cld
#> which_unlist 51.32227 51.32227 56.78501 56.78501 62.24776 62.24776 2 a
#> sdcols 43.14361 43.14361 49.33491 49.33491 55.52621 55.52621 2 a
You can use Janitor package remove_empty
library(janitor)
df %>%
remove_empty(c("rows", "cols")) #select either row or cols or both
Also, Another dplyr approach
library(dplyr)
df %>% select_if(~all(!is.na(.)))
OR
df %>% select_if(colSums(!is.na(.)) == nrow(df))
this is also useful if you want to only exclude / keep column with certain number of missing values e.g.
df %>% select_if(colSums(!is.na(.))>500)
I hope this may also help. It could be made into a single command, but I found it easier for me to read by dividing it in two commands. I made a function with the following instruction and worked lightning fast.
naColsRemoval = function (DataTable) {
na.cols = DataTable [ , .( which ( apply ( is.na ( .SD ) , 2 , all ) ) )]
DataTable [ , unlist (na.cols) := NULL , with = F]
}
.SD will allow to limit the verification to part of the table, if you wish, but it will take the whole table as
A handy base R option could be colMeans():
df[, colMeans(is.na(df)) != 1]
janitor::remove_constant() does this very nicely.
From my experience of having trouble applying previous answers, I have found that I needed to modify their approach in order to achieve what the question here is:
How to get rid of columns where for ALL rows the value is NA?
First note that my solution will only work if you do not have duplicate columns (that issue is dealt with here (on stack overflow)
Second, it uses dplyr.
Instead of
df <- df %>% select_if(~all(!is.na(.)))
I find that what works is
df <- df %>% select_if(~!all(is.na(.)))
The point is that the "not" symbol "!" needs to be on the outside of the universal quantifier. I.e. the select_if operator acts on columns. In this case, it selects only those that do not satisfy the criterion
every element is equal to "NA"

Removing rows having only zeros [duplicate]

This question already has answers here:
How to remove rows with 0 values using R
(2 answers)
Closed 2 years ago.
I want to remove all the rows having either zeros or NAs. In the code below I am selecting numeric variables and then filtering out 0s. Problem here is it does not return character variables along with numeric ones in the final output.
df <- read.table(header = TRUE, text =
"x y z
a 1 2
b 0 3
c 1 NA
d 0 NA
")
df %>% select_if(is.numeric) %>% filter(rowSums(., na.rm = T)!=0)
You can use filter_if :
library(dplyr)
df %>% filter_if(is.numeric, any_vars(. != 0 & !is.na(.)))
# x y z
#1 a 1 2
#2 b 0 3
#3 c 1 NA
Or using base R :
cols <- sapply(df, is.numeric)
df[rowSums(!is.na(df[cols]) & df[cols] != 0) > 0, ]
Another dplyr option could be:
df %>%
rowwise() %>%
filter(any(across(where(is.numeric)) != 0, na.rm = TRUE))
x y z
<fct> <int> <int>
1 a 1 2
2 b 0 3
3 c 1 NA
Following the suggestions written in this new doc page after the release of dplyr version 1.0.0, you can create a helper function to substitute the superseded functions filter_if and any_vars.
Previously, filter() was paired with the all_vars() and any_vars()
helpers. Now, across() is equivalent to all_vars(), and there’s no
direct replacement for any_vars(). However you can make a simple
helper yourself
From now on, this way should be the reference method for this kind of filtering steps.
rowAny <- function(x) {rowSums(x != 0 & !is.na(x)) > 0}
df %>% filter(rowAny(across(where(is.numeric))))
# x y z
# 1 a 1 2
# 2 b 0 3
# 3 c 1 NA
You could simply do
df[rowSums(suppressWarnings(sapply(df, as.double)), na.rm=TRUE) > 0, ]
# x y z
# 1 a 1 2
# 2 b 0 3
# 3 c 1 NA

Remove columns from dataframe where ALL values are NA

I have a data frame where some of the columns contain NA values.
How can I remove columns where all rows contain NA values?
Try this:
df <- df[,colSums(is.na(df))<nrow(df)]
The two approaches offered thus far fail with large data sets as (amongst other memory issues) they create is.na(df), which will be an object the same size as df.
Here are two approaches that are more memory and time efficient
An approach using Filter
Filter(function(x)!all(is.na(x)), df)
and an approach using data.table (for general time and memory efficiency)
library(data.table)
DT <- as.data.table(df)
DT[,which(unlist(lapply(DT, function(x)!all(is.na(x))))),with=F]
examples using large data (30 columns, 1e6 rows)
big_data <- replicate(10, data.frame(rep(NA, 1e6), sample(c(1:8,NA),1e6,T), sample(250,1e6,T)),simplify=F)
bd <- do.call(data.frame,big_data)
names(bd) <- paste0('X',seq_len(30))
DT <- as.data.table(bd)
system.time({df1 <- bd[,colSums(is.na(bd) < nrow(bd))]})
# error -- can't allocate vector of size ...
system.time({df2 <- bd[, !apply(is.na(bd), 2, all)]})
# error -- can't allocate vector of size ...
system.time({df3 <- Filter(function(x)!all(is.na(x)), bd)})
## user system elapsed
## 0.26 0.03 0.29
system.time({DT1 <- DT[,which(unlist(lapply(DT, function(x)!all(is.na(x))))),with=F]})
## user system elapsed
## 0.14 0.03 0.18
Update
You can now use select with the where selection helper. select_if is superceded, but still functional as of dplyr 1.0.2. (thanks to #mcstrother for bringing this to attention).
library(dplyr)
temp <- data.frame(x = 1:5, y = c(1,2,NA,4, 5), z = rep(NA, 5))
not_all_na <- function(x) any(!is.na(x))
not_any_na <- function(x) all(!is.na(x))
> temp
x y z
1 1 1 NA
2 2 2 NA
3 3 NA NA
4 4 4 NA
5 5 5 NA
> temp %>% select(where(not_all_na))
x y
1 1 1
2 2 2
3 3 NA
4 4 4
5 5 5
> temp %>% select(where(not_any_na))
x
1 1
2 2
3 3
4 4
5 5
Old Answer
dplyr now has a select_if verb that may be helpful here:
> temp
x y z
1 1 1 NA
2 2 2 NA
3 3 NA NA
4 4 4 NA
5 5 5 NA
> temp %>% select_if(not_all_na)
x y
1 1 1
2 2 2
3 3 NA
4 4 4
5 5 5
> temp %>% select_if(not_any_na)
x
1 1
2 2
3 3
4 4
5 5
Late to the game but you can also use the janitor package. This function will remove columns which are all NA, and can be changed to remove rows that are all NA as well.
df <- janitor::remove_empty(df, which = "cols")
Another way would be to use the apply() function.
If you have the data.frame
df <- data.frame (var1 = c(1:7,NA),
var2 = c(1,2,1,3,4,NA,NA,9),
var3 = c(NA)
)
then you can use apply() to see which columns fulfill your condition and so you can simply do the same subsetting as in the answer by Musa, only with an apply approach.
> !apply (is.na(df), 2, all)
var1 var2 var3
TRUE TRUE FALSE
> df[, !apply(is.na(df), 2, all)]
var1 var2
1 1 1
2 2 2
3 3 1
4 4 3
5 5 4
6 6 NA
7 7 NA
8 NA 9
Another options with purrr package:
library(dplyr)
df <- data.frame(a = NA,
b = seq(1:5),
c = c(rep(1, 4), NA))
df %>% purrr::discard(~all(is.na(.)))
df %>% purrr::keep(~!all(is.na(.)))
df[sapply(df, function(x) all(is.na(x)))] <- NULL
An old question, but I think we can update #mnel's nice answer with a simpler data.table solution:
DT[, .SD, .SDcols = \(x) !all(is.na(x))]
(I'm using the new \(x) lambda function syntax available in R>=4.1, but really the key thing is to pass the logical subsetting through .SDcols.
Speed is equivalent.
microbenchmark::microbenchmark(
which_unlist = DT[, which(unlist(lapply(DT, \(x) !all(is.na(x))))), with=FALSE],
sdcols = DT[, .SD, .SDcols = \(x) !all(is.na(x))],
times = 2
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval cld
#> which_unlist 51.32227 51.32227 56.78501 56.78501 62.24776 62.24776 2 a
#> sdcols 43.14361 43.14361 49.33491 49.33491 55.52621 55.52621 2 a
You can use Janitor package remove_empty
library(janitor)
df %>%
remove_empty(c("rows", "cols")) #select either row or cols or both
Also, Another dplyr approach
library(dplyr)
df %>% select_if(~all(!is.na(.)))
OR
df %>% select_if(colSums(!is.na(.)) == nrow(df))
this is also useful if you want to only exclude / keep column with certain number of missing values e.g.
df %>% select_if(colSums(!is.na(.))>500)
I hope this may also help. It could be made into a single command, but I found it easier for me to read by dividing it in two commands. I made a function with the following instruction and worked lightning fast.
naColsRemoval = function (DataTable) {
na.cols = DataTable [ , .( which ( apply ( is.na ( .SD ) , 2 , all ) ) )]
DataTable [ , unlist (na.cols) := NULL , with = F]
}
.SD will allow to limit the verification to part of the table, if you wish, but it will take the whole table as
A handy base R option could be colMeans():
df[, colMeans(is.na(df)) != 1]
janitor::remove_constant() does this very nicely.
From my experience of having trouble applying previous answers, I have found that I needed to modify their approach in order to achieve what the question here is:
How to get rid of columns where for ALL rows the value is NA?
First note that my solution will only work if you do not have duplicate columns (that issue is dealt with here (on stack overflow)
Second, it uses dplyr.
Instead of
df <- df %>% select_if(~all(!is.na(.)))
I find that what works is
df <- df %>% select_if(~!all(is.na(.)))
The point is that the "not" symbol "!" needs to be on the outside of the universal quantifier. I.e. the select_if operator acts on columns. In this case, it selects only those that do not satisfy the criterion
every element is equal to "NA"

Resources