How to remove columns full of only NA values - r

Here is an example of the output when I execute the code chunk "is.na() function.
start_lat start_lng end_lat end_lng member_casual ride_length day_of_week X X.1 X.2
[1,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
[2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
[3,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
The "x", "x.1", and "x.2" columns are added to my dataframe and I don't know where they came from. I used na.omit function, but the columns are not recognized. In other words, they are not valid names. Can someone please help me remove these columns in my dataframe.

## figure out which columns are all NA values
all_na_cols = sapply(your_data, \(x) all(is.na(x)))
## drop them
your_data = your_data[!all_na_cols]
Running na.omit() on a data frame will drop rows if they have one or more NA values in them, so not what you want to do here.
The "x", "x.1", and "x.2" columns are added to my dataframe and I don't know where they came from.
That would worry me a lot. If I were you, I'd go back in your script and run it one line at a time until I found out where those columns came from, and then I'd solve the source of problem there rather than putting a bandage on it here.

A tidyverse solution
Using dlpyr::select()
Make some dummy data:
require(dplyr)
myData <- tibble(a = c(1,2,3,4), b = c("a", "b", "c", "d"),
c = c(NA, NA, NA, NA), d = c(NA, "not_na", "not_na", NA))
myData
#> # A tibble: 4 x 4
#> a b c d
#> <dbl> <chr> <lgl> <chr>
#> 1 1 a NA <NA>
#> 2 2 b NA not_na
#> 3 3 c NA not_na
#> 4 4 d NA <NA>
Select only the rows that are not all NA
myNewData <- select(myData, where(function(x) !all(is.na(x))))
myNewData
#> # A tibble: 4 x 3
#> a b d
#> <dbl> <chr> <chr>
#> 1 1 a <NA>
#> 2 2 b not_na
#> 3 3 c not_na
#> 4 4 d <NA>
Created on 2022-02-16 by the reprex package (v2.0.1)

Related

Another "how to deal with NAs in logical statements" question

Short version: I need is to get a results column r like this, ideally using dplyr (but happy for base R as well):
d <- tibble(c1 = c(T,T,F,T,F,NA), c2 = c(T,F,F,F,F,NA), c3 = c(T,F,F,NA,NA,NA))
d %>% rowwise() %>% mutate(r = something())
# A tibble: 6 x 3
c1 c2 c3 r
<lgl> <lgl> <lgl> <lgl>
1 TRUE TRUE TRUE TRUE
2 TRUE FALSE FALSE TRUE
3 FALSE FALSE FALSE FALSE
4 TRUE FALSE NA TRUE
5 FALSE FALSE NA FALSE
6 NA NA NA NA
I understand why NA|FALSE == NA. Each TRUE/FALSE in this table is the result of a comparison, and I would really like to keep the syntax as short as possible.
Long version:
I have survey results, and need to create a summary of three questions asking for the primary, secondary and tertiary 'route to something' (there are more than 3 levels in reality). The summary should tell me, for each respondent, whether they made use of route A, route B, etc. Not all respondents filled in all questions, so there might be NAs. Some respondents didn't answer any of the question at all, and their summary should be NA. So I have:
df <- tibble(primary = c("C", "A", "B", "D", NA),
secondary = c("B", "D", "C", NA, NA),
tertiary = c("A", "E", NA, NA, NA))
# I think I need something along these lines:
df <- df %>% rowwise() %>%
mutate(
routeA = (primary == "A") | (secondary == "A") | (tertiary == "A") ...
routeB = ....
)
# Result expected
df
# A tibble:
primary secondary tertiary routeA routeB ...
<chr> <chr> <chr> <lgl> <lgl>
C B A TRUE TRUE
A D E TRUE FALSE
B C NA FALSE TRUE
D NA NA FALSE FALSE
NA NA NA NA NA
You can do this relatively efficiently with apply and match from base R:
f <- function(x, levels) {
if (all(is.na(x))) {
rep.int(NA, length(levels))
} else {
as.logical(match(levels, x, 0L))
}
}
lv <- LETTERS[1:5]
df[paste0("route", lv)] <- t(apply(df, 1L, f, levels = lv))
df
## # A tibble: 5 × 8
## primary secondary tertiary routeA routeB routeC routeD routeE
## <chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl> <lgl>
## 1 C B A TRUE TRUE TRUE FALSE FALSE
## 2 A D E TRUE FALSE FALSE TRUE TRUE
## 3 B C NA FALSE TRUE TRUE FALSE FALSE
## 4 D NA NA FALSE FALSE FALSE TRUE FALSE
## 5 NA NA NA NA NA NA NA NA
I say "relatively" because rowwise operations on data frames tend to be less efficient than rowwise operations on matrices, requiring coercions to and from matrix or reshaping to and from long format.
This case is no exception, as apply coerces df from data frame to matrix and the assignment coerces the result of t from matrix to data frame.
Suboptimal:
my_match <- function(x, val) {
if (all(is.na(x))) return(NA)
return(any(na.omit(x) == val))
}
df %>% rowwise() %>% mutate(rA = my_match(c_across(where(is.character)), "A"),
rB = my_match(c_across(where(is.character)), "B"))
To be improved:
this won't scale well to larger numbers of routes
too much repeated code (another way of saying the same thing) — but I'm not quite sure how to create a function/shortcut version of this (could loop over the possible sites adding one column at a time, but I don't feel like going quite as far as necessary down the rlang/tidy-evaluation/NSE rabbit hole right now ...)
As mentioned in the comments, this is straightforward when the data is reshaped to long format and then back to wide.
library(tidyr)
library(dplyr)
library(tibble)
df <- df %>%
rowid_to_column()
df %>%
pivot_longer(-rowid) %>%
filter(!is.na(value)) %>%
pivot_wider(id_cols = rowid, names_from = value, values_fill = FALSE, values_fn = ~ TRUE, names_sort = TRUE) %>%
left_join(df, ., by = "rowid")
# A tibble: 5 x 9
rowid primary secondary tertiary A B C D E
<int> <chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl> <lgl>
1 1 C B A TRUE TRUE TRUE FALSE FALSE
2 2 A D E TRUE FALSE FALSE TRUE TRUE
3 3 B C NA FALSE TRUE TRUE FALSE FALSE
4 4 D NA NA FALSE FALSE FALSE TRUE FALSE
5 5 NA NA NA NA NA NA NA NA
Another idea is:
ans = unclass(table(row(df), unlist(df)))
ans
# A B C D E
# 1 1 1 1 0 0
# 2 1 0 0 1 1
# 3 0 1 1 0 0
# 4 0 0 0 1 0
# 5 0 0 0 0 0
Missing values can, also, be filled where appropriate:
ans[!rowSums(ans)] = NA
ans

Using grepl to match character in a string of characters with delimiters

There are a number of solutions for using grepl(), but none which solves my problem (that I have come across so far). I have two data frames. The first labelled x containing a set of combinations associated with a letter:
structure(list(variable = c("A", "B", "C", "D"), combinations = c("16, 17, 18",
"17,18", "16,18", "12,3")), class = "data.frame", row.names = c(NA,
-4L))
> x
variable combinations
1 A 16, 17, 18
2 B 17,18
3 C 16,18
4 D 12,3
The second data frame is the results. It is a set of observations showing the letters that a species interacted with. Below is just one set of observations:
structure(list(variable = c("A, C", NA, NA), species = c("16",
"17", "18"), active = c("16", NA, NA)), class = "data.frame", row.names = c(NA,
-3L))
> y
variable species active
1 A, C 16 16
2 <NA> 17 <NA>
3 <NA> 18 <NA>
This was the original structure of y:
> y
variable species.active species.present
1 A, C 16 17,18
The structure was changed to add more columns associated to each species (so each species had a row), thus the structure serves a specific purpose.
What I want is to have a binary column (T/F or 0/1) to show whether or not each species are in the combinations associated with the variable.
This is what I have managed so far:
library(zoo)
library(dplyr)
#carry locf so that each species are assigned the same variables
y <- y %>%
mutate(variable = zoo::na.locf(variable))
#separate each row to separate combinations
library(tidyr)
y <- separate_rows(y, variable)
#match x$variable by y$variable to add associated combinations in a new column in y
y$combinations <- ifelse(y$variable %in% x$variable, x$combinations)
#return true or false if each species is in the combination
y$type <- grepl(y$species, y$combinations);y
> y
variable species active combinations type
<chr> <chr> <chr> <chr> <lgl>
1 A 16 16 16, 17, 18 TRUE
2 C 16 16 17,18 FALSE
3 A 17 NA 16,18 TRUE
4 C 17 NA 12,3 FALSE
5 A 18 NA 16, 17, 18 TRUE
6 C 18 NA 17,18 FALSE
As you can see, the combinations are wrong and the gprel() returns incorrect T/F (refer to row 3 where it says it is true but '17' is not in the combination anyway.
If anyone could help, that would be greatly appreciated.
Try this, choosing one of type1 or type2 (same result), whichever you prefer.
library(dplyr)
left_join(y, x, by = "variable") %>%
mutate(
type1 = mapply(`%in%`, species, strsplit(combinations, "\\D+")),
type2 = mapply(grepl, paste0("\\b", species, "\\b"), combinations)
)
# # A tibble: 6 x 6
# variable species active combinations type1 type2
# <chr> <chr> <chr> <chr> <lgl> <lgl>
# 1 A 16 16 16, 17, 18 TRUE TRUE
# 2 C 16 16 16,18 TRUE TRUE
# 3 A 17 <NA> 16, 17, 18 TRUE TRUE
# 4 C 17 <NA> 16,18 FALSE FALSE
# 5 A 18 <NA> 16, 17, 18 TRUE TRUE
# 6 C 18 <NA> 16,18 TRUE TRUE
Or starting with the original y:
y
# variable species active
# 1 A, C 16 16
# 2 <NA> 17 <NA>
# 3 <NA> 18 <NA>
y %>%
mutate(variable = zoo::na.locf(variable)) %>%
tidyr::separate_rows(variable) %>%
left_join(., x, by = "variable") %>%
mutate(type1 = mapply(`%in%`, species, strsplit(combinations, "\\D+")), type2 = mapply(grepl, paste0("\\b", species, "\\b"), combinations))
# # A tibble: 6 x 6
# variable species active combinations type1 type2
# <chr> <chr> <chr> <chr> <lgl> <lgl>
# 1 A 16 16 16, 17, 18 TRUE TRUE
# 2 C 16 16 16,18 TRUE TRUE
# 3 A 17 <NA> 16, 17, 18 TRUE TRUE
# 4 C 17 <NA> 16,18 FALSE FALSE
# 5 A 18 <NA> 16, 17, 18 TRUE TRUE
# 6 C 18 <NA> 16,18 TRUE TRUE
FYI, some things wrong with your question:
When asking questions that include warnings or errors, you need to include them; in this case, grepl's first argument must be length 1, and it appears you are ignoring it:
grepl(y$species, y$combinations)
# Warning in grepl(y$species, y$combinations) :
# argument 'pattern' has length > 1 and only the first element will be used
ifelse in your code seems to work, but you are using it incorrectly: it requires a no= argument as well, so there needs to be something as its third argument. It does not error here because everything resolves to be true (which is another problem) so it never attempts to evaluate no=.
ifelse(c(T,T), 1:2)
# [1] 1 2
ifelse(c(T,F), 1:2)
# Error in ifelse(c(T, F), 1:2) : argument "no" is missing, with no default
ifelse(c(T,F), 1:2, 11:12)
# [1] 1 12
What you're attempting to do is merge/join x and y, so the tools you want are among base::merge and dplyr::*_join (for starters, others exist). To better understand what's going on in a join, I suggest you see How to join (merge) data frames (inner, outer, left, right), https://stackoverflow.com/a/6188334/3358272.

Binding dataframes with different column names by row

I imported this excel sheet as a list of dataframes. I want to merge the list into one dataframe. bind_rows() allow me to easily add together the dataframes, but the issue is that I have a variable/column that has different names in each dataframe. bind_row() will by default create two separate columns, with empty values for the data from the other data frames. How can I join these columns?
Sample code:
# Sample dataframes
df1 <- tibble(A = c(1,2,3),
B = c("X","Y","Z"),
C = c(T,F,F)
)
df2 <- tibble(A = c(3,4,5),
B = c("U","V","W"),
D = c(T,T,F)
)
# List of dataframes
my_ls <- list(df1, df2)
my_ls
[[1]]
# A tibble: 3 x 3
A B C
<dbl> <chr> <lgl>
1 1 X TRUE
2 2 Y FALSE
3 3 Z FALSE
[[2]]
# A tibble: 3 x 3
A B D
<dbl> <chr> <lgl>
1 3 U TRUE
2 4 V TRUE
3 5 W FALSE
# Creating joined dataframe:
my_df <- bind_rows(my_ls)
my_df
# Current outcome: A tibble: 6 x 4
A B C D
<dbl> <chr> <lgl> <lgl>
1 1 X TRUE NA
2 2 Y FALSE NA
3 3 Z FALSE NA
4 3 U NA TRUE
5 4 V NA TRUE
6 5 W NA FALSE
The desired outcome:
# Desired outcome: A tibble: 6 x 3
A B C
<dbl> <chr> <lgl>
1 1 X TRUE
2 2 Y FALSE
3 3 Z FALSE
4 3 U TRUE
5 4 V TRUE
6 5 W FALSE
Currently, I've been using mutate() with case_when(), where I check which column is not empty (!is.na()). This works, but I can't help but think there must be an easier way.
# Example using mutate
my_df <- my_df %>%
mutate(
C = case_when(is.na(C) & !is.na(D) ~ D,
!is.na(C) & is.na(D) ~ C,
# The lines below may be a bit redundant for my purpose, since the dataframes either have the C or D variable.
!is.na(C) & !is.na(D) ~ C, # Better would be to return that variable has overlapping information
is.na(C) & is.na(D) ~ NA
)
) %>%
select(-D)
my_df
# A tibble: 6 x 3
A B C
<dbl> <chr> <lgl>
1 1 X TRUE
2 2 Y FALSE
3 3 Z FALSE
4 3 U TRUE
5 4 V TRUE
6 5 W FALSE
You can bind_rows and then select non-NA value using coalesce :
library(dplyr)
bind_rows(my_ls) %>% mutate(C = coalesce(C, D)) %>% select(A:C)
# A B C
# <dbl> <chr> <lgl>
#1 1 X TRUE
#2 2 Y FALSE
#3 3 Z FALSE
#4 3 U TRUE
#5 4 V TRUE
#6 5 W FALSE
Following the comment by #KarthikS you can rename your columns before binding. My approach using rename_with does not require the columns to be in a specific order. To illusrate this I used somewhat different example dataframes:
library(purrr)
library(dplyr)
d1 <- data.frame(A = 1, B = 2, C = 3)
d2 <- data.frame(A = 4, B = 5, D = 6)
d3 <- data.frame(D = 7, A = 8, B = 9)
d <- list(d1, d2, d3)
map(d, ~ rename_with(.x, ~ "C", matches("^D$"))) %>%
bind_rows()
#> A B C
#> 1 1 2 3
#> 2 4 5 6
#> 3 8 9 7
And now four your dataset:
d <- list(df1, df2)
map(d, ~ rename_with(.x, ~ "C", matches("^D$"))) %>%
bind_rows()
#> # A tibble: 6 x 3
#> A B C
#> <dbl> <chr> <lgl>
#> 1 1 X TRUE
#> 2 2 Y FALSE
#> 3 3 Z FALSE
#> 4 3 U TRUE
#> 5 4 V TRUE
#> 6 5 W FALSE
And if we add an addtional one with a different order:
df3 <- tibble(D = c(T,T,F),
A = c(7,8,9),
B = c("A","B","C"))
d <- list(df1, df2, df3)
map(d, ~ rename_with(.x, ~ "C", matches("^D$"))) %>%
bind_rows()
#> # A tibble: 9 x 3
#> A B C
#> <dbl> <chr> <lgl>
#> 1 1 X TRUE
#> 2 2 Y FALSE
#> 3 3 Z FALSE
#> 4 3 U TRUE
#> 5 4 V TRUE
#> 6 5 W FALSE
#> 7 7 A TRUE
#> 8 8 B TRUE
#> 9 9 C FALSE
Created on 2020-10-16 by the reprex package (v0.3.0)
Apologize for breaking out of the tidyverse for a quick answer
expl <- read.table(text= " A B C D
1 1 X TRUE NA
2 2 Y FALSE NA
3 3 Z FALSE NA
4 3 U NA TRUE
5 4 V NA TRUE
6 5 W NA FALSE")
expl$E <- ifelse(is.na(expl$C), expl$D, expl$C)
print(expl)
or maybe
expl[,c("C", "D")] %>% rowMeans(na.rm = TRUE) %>% as.logical()
EDIT: Translated the latter to tidy:
expl %>% select("C", "D") %>% rowMeans(na.rm = TRUE) %>% as.logical()
EDIT after first comment:
If you want more control you should probably write the things you want to do in each case in a function similar to the following example:
library(magrittr)
expl <- read.table(text= " A B C D
1 1 X TRUE NA
2 2 Y FALSE NA
3 3 Z FALSE NA
4 3 U NA TRUE
5 4 V NA TRUE
6 5 W NA FALSE
7 7 I NA NA
8 9 J TRUE TRUE")
myfun <- function(a, b){
if(is.na(a) & is.na(b))
return(NA)
if(!is.na(a) & !is.na(b)) {
warning("too much information, a and b set!")
return(NaN)
}
return(max(a, b, na.rm=TRUE))
}
myfun = Vectorize(myfun)
myfun(expl$C, expl$D) %>% as.logical()

Add one variable to specify which one of several variables that equals a value

I'm learning R and looking for best practices here...
Main question
Given the tibble my_tibble:
# A tibble: 5 x 3
chkA chkB chkC
<chr> <chr> <chr>
1 NA NA NA
2 x NA NA
3 NA x NA
4 x NA NA
5 NA NA x
I want to create a variable/column checked that specifies which of the variables chkA, chkB or chkC that equals "x". For each observation/row, only zero or one of those three variables can equal "x", while the rest of them are NA.
I can solve it with this code:
my_tibble <- my_tibble %>% mutate(checked = case_when(
chkA == "x" ~ "A",
chkB == "x" ~ "B",
chkC == "x" ~ "C",
TRUE ~ "(none)"
))
, which produces this tibble:
# A tibble: 5 x 4
chkA chkB chkC checked
<chr> <chr> <chr> <chr>
1 NA NA NA (none)
2 x NA NA A
3 NA x NA B
4 x NA NA A
5 NA NA x C
However, I assume/hope there could be a more convenient/elegant solution, perhaps a one-liner or something, as I think this is a frequent problem.
Bonus/follow-up question
For a "cleaner" tibble, I'd probably like to get rid of the chr variables and their NAs, by converting to lgl variables. I solved that with this code:
my_tibble <- my_tibble %>% mutate(chkA = ifelse(is.na(chkA), FALSE, TRUE))
my_tibble <- my_tibble %>% mutate(chkB = ifelse(is.na(chkB), FALSE, TRUE))
my_tibble <- my_tibble %>% mutate(chkC = ifelse(is.na(chkC), FALSE, TRUE))
, creating this tibble:
# A tibble: 5 x 3
chkA chkB chkC
<lgl> <lgl> <lgl>
1 FALSE FALSE FALSE
2 TRUE FALSE FALSE
3 FALSE TRUE FALSE
4 TRUE FALSE FALSE
5 FALSE FALSE TRUE
Is there a better way?
Here can use across with mutate for multiple column. Also, the output of is.na is logical and it can be negated (!) to return the opposite instead of using ifelse or case_when
library(dplyr)
my_tibble %>%
select(starts_with('chk')) %>%
mutate(across(everything(), ~!is.na(.)))
# A tibble: 5 x 3
# chkA chkB chkC
# <lgl> <lgl> <lgl>
#1 FALSE FALSE FALSE
#2 TRUE FALSE FALSE
#3 FALSE TRUE FALSE
#4 TRUE FALSE FALSE
#5 FALSE FALSE TRUE
Or without anonymous function call
library(purrr)
my_tibble %>%
select(starts_with('chk')) %>%
mutate(across(everything(), negate(is.na)))
# A tibble: 5 x 3
# chkA chkB chkC
# <lgl> <lgl> <lgl>
#1 FALSE FALSE FALSE
#2 TRUE FALSE FALSE
#3 FALSE TRUE FALSE
#4 TRUE FALSE FALSE
#5 FALSE FALSE TRUE
For creating the column name based on the occurrence of NA, a vectorized option is with max.col
nm1 <- sub('chk', '', names(my_tibble))[max.col(!is.na(my_tibble), 'first')]
nm1[!rowSums(!is.na(my_tibble))] <- NA_character_
my_tibble$checks <- nm1
my_tibble
# A tibble: 5 x 4
# chkA chkB chkC checks
# <chr> <chr> <chr> <chr>
#1 <NA> <NA> <NA> <NA>
#2 x <NA> <NA> A
#3 <NA> x <NA> B
#4 x <NA> <NA> A
#5 <NA> <NA> x C
An option in tidyverse to create the column would be to first rename the columns by removing the 'chk' prefix, then use imap to loop over the columns, replace the non-NA elements with column name and use coalesce to return the first non-NA value, create the 'checked' column in 'my_tibble'
library(stringr)
my_tibble <- my_tibble %>%
rename_all(~ str_remove(., 'chk')) %>%
imap_dfc(~ case_when(!is.na(.x) ~ .y)) %>%
invoke(coalesce, .) %>%
mutate(my_tibble, checked = .)
-output
my_tibble
# A tibble: 5 x 4
# chkA chkB chkC checked
# <chr> <chr> <chr> <chr>
#1 <NA> <NA> <NA> <NA>
#2 x <NA> <NA> A
#3 <NA> x <NA> B
#4 x <NA> <NA> A
#5 <NA> <NA> x C
data
my_tibble <- structure(list(chkA = c(NA, "x", NA, "x", NA), chkB = c(NA, NA,
"x", NA, NA), chkC = c(NA, NA, NA, NA, "x")), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
One dplyr option for finding the column with x could be:
df %>%
rowwise() %>%
mutate(checked = names(.)[replace(which(c_across(everything()) == "x"), length(.) == 0, NA)])
chkA chkB chkC checked
<chr> <chr> <chr> <chr>
1 <NA> <NA> <NA> <NA>
2 x <NA> <NA> chkA
3 <NA> x <NA> chkB
4 x <NA> <NA> chkA
5 <NA> <NA> x chkC

Value matching with NA - missing values - using mutate

I am somewhat stuck. Is there a better way than the below to do value matching considering NAs as "real values" within mutate?
library(dplyr)
data_foo <- data.frame(A= c(1:2, NA, 4, NA), B = c(1, 3, NA, NA, 4))
Not the desired output:
data_foo %>% mutate(irr = A==B)
#> A B irr
#> 1 1 1 TRUE
#> 2 2 3 FALSE
#> 3 NA NA NA
#> 4 4 NA NA
#> 5 NA 4 NA
data_foo %>% rowwise() %>% mutate(irr = A%in%B)
#> Source: local data frame [5 x 3]
#> Groups: <by row>
#>
#> # A tibble: 5 x 3
#> A B irr
#> <dbl> <dbl> <lgl>
#> 1 1 1 TRUE
#> 2 2 3 FALSE
#> 3 NA NA FALSE
#> 4 4 NA FALSE
#> 5 NA 4 FALSE
Desired output: The below shows the desired column, irr. I am using this somewhat cumbersome helper columns. Is there a shorter way?
data_foo %>%
mutate(NA_A = is.na(A),
NA_B = is.na(B),
irr = if_else(is.na(A)|is.na(B), NA_A == NA_B, A == B))
#> A B NA_A NA_B irr
#> 1 1 1 FALSE FALSE TRUE
#> 2 2 3 FALSE FALSE FALSE
#> 3 NA NA TRUE TRUE TRUE
#> 4 4 NA FALSE TRUE FALSE
#> 5 NA 4 TRUE FALSE FALSE
Using map2
library(tidyverse)
data_foo %>%
mutate(irr = map2_lgl(A, B, `%in%`))
# A B irr
#1 1 1 TRUE
#2 2 3 FALSE
#3 NA NA TRUE
#4 4 NA FALSE
#5 NA 4 FALSE
Or with setequal
data_foo %>%
rowwise %>%
mutate(irr = setequal(A, B))
The above method is concise, but it is also loopy. We can replace the NA with a different value and then do the ==
data_foo %>%
mutate_all(list(new = ~ replace_na(., -999))) %>%
transmute(A, B, irr = A_new == B_new)
# A B irr
#1 1 1 TRUE
#2 2 3 FALSE
#3 NA NA TRUE
#4 4 NA FALSE
#5 NA 4 FALSE
Or with bind_cols and reduce
data_foo %>%
mutate_all(replace_na, -999) %>%
reduce(`==`) %>%
bind_cols(data_foo, irr = .)
Maybe simpler than akrun's answer?
Any of the two ways below will produce the expected result. Note that as.character won't do it, because the return value of as.character(NA) is NA_character_.
data_foo %>%
mutate(irr = paste(A) == paste(B))
data_foo %>%
mutate(irr = sQuote(A) == sQuote(B))
#Source: local data frame [5 x 3]
#Groups: <by row>
#
## A tibble: 5 x 3
# A B irr
# <dbl> <dbl> <lgl>
#1 1 1 TRUE
#2 2 3 FALSE
#3 NA NA TRUE
#4 4 NA FALSE
#5 NA 4 FALSE
Edit.
Following the comments below I have updated the code and it now follows akrun's suggestion.
There is also the excellent idea in tmfmnk's answer. I use a similar one in yet another way of solving the question's problem.
The documentation of all.equal says that
Do not use all.equal directly in if expressions—either use
isTRUE(all.equal(....)) or identical if appropriate.
Though there is no if expression in mutate, I believe that it is more stable than identical and has the same effect if the values being compared are (sort of/in fact) equal.
data_foo %>%
mutate(irr = isTRUE(all.equal(A, B)))
Could also be a possibility:
data_foo %>%
rowwise() %>%
mutate(irr = identical(A, B)) %>%
ungroup()
A B irr
<dbl> <dbl> <lgl>
1 1 1 TRUE
2 2 3 FALSE
3 NA NA TRUE
4 4 NA FALSE
5 NA 4 FALSE
The coalesce function is useful if you want to perform an action when a value is NA
data_foo %>%
mutate(irr = coalesce(A == B, is.na(A) & is.na(B)))
# A B irr
# 1 1 1 TRUE
# 2 2 3 FALSE
# 3 NA NA TRUE
# 4 4 NA FALSE
# 5 NA 4 FALSE
Same thing for > 2 columns
data_foo %>%
mutate(irr = coalesce(reduce(., `==`), rowMeans(is.na(.)) == 1))

Resources