Short version: I need is to get a results column r like this, ideally using dplyr (but happy for base R as well):
d <- tibble(c1 = c(T,T,F,T,F,NA), c2 = c(T,F,F,F,F,NA), c3 = c(T,F,F,NA,NA,NA))
d %>% rowwise() %>% mutate(r = something())
# A tibble: 6 x 3
c1 c2 c3 r
<lgl> <lgl> <lgl> <lgl>
1 TRUE TRUE TRUE TRUE
2 TRUE FALSE FALSE TRUE
3 FALSE FALSE FALSE FALSE
4 TRUE FALSE NA TRUE
5 FALSE FALSE NA FALSE
6 NA NA NA NA
I understand why NA|FALSE == NA. Each TRUE/FALSE in this table is the result of a comparison, and I would really like to keep the syntax as short as possible.
Long version:
I have survey results, and need to create a summary of three questions asking for the primary, secondary and tertiary 'route to something' (there are more than 3 levels in reality). The summary should tell me, for each respondent, whether they made use of route A, route B, etc. Not all respondents filled in all questions, so there might be NAs. Some respondents didn't answer any of the question at all, and their summary should be NA. So I have:
df <- tibble(primary = c("C", "A", "B", "D", NA),
secondary = c("B", "D", "C", NA, NA),
tertiary = c("A", "E", NA, NA, NA))
# I think I need something along these lines:
df <- df %>% rowwise() %>%
mutate(
routeA = (primary == "A") | (secondary == "A") | (tertiary == "A") ...
routeB = ....
)
# Result expected
df
# A tibble:
primary secondary tertiary routeA routeB ...
<chr> <chr> <chr> <lgl> <lgl>
C B A TRUE TRUE
A D E TRUE FALSE
B C NA FALSE TRUE
D NA NA FALSE FALSE
NA NA NA NA NA
You can do this relatively efficiently with apply and match from base R:
f <- function(x, levels) {
if (all(is.na(x))) {
rep.int(NA, length(levels))
} else {
as.logical(match(levels, x, 0L))
}
}
lv <- LETTERS[1:5]
df[paste0("route", lv)] <- t(apply(df, 1L, f, levels = lv))
df
## # A tibble: 5 × 8
## primary secondary tertiary routeA routeB routeC routeD routeE
## <chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl> <lgl>
## 1 C B A TRUE TRUE TRUE FALSE FALSE
## 2 A D E TRUE FALSE FALSE TRUE TRUE
## 3 B C NA FALSE TRUE TRUE FALSE FALSE
## 4 D NA NA FALSE FALSE FALSE TRUE FALSE
## 5 NA NA NA NA NA NA NA NA
I say "relatively" because rowwise operations on data frames tend to be less efficient than rowwise operations on matrices, requiring coercions to and from matrix or reshaping to and from long format.
This case is no exception, as apply coerces df from data frame to matrix and the assignment coerces the result of t from matrix to data frame.
Suboptimal:
my_match <- function(x, val) {
if (all(is.na(x))) return(NA)
return(any(na.omit(x) == val))
}
df %>% rowwise() %>% mutate(rA = my_match(c_across(where(is.character)), "A"),
rB = my_match(c_across(where(is.character)), "B"))
To be improved:
this won't scale well to larger numbers of routes
too much repeated code (another way of saying the same thing) — but I'm not quite sure how to create a function/shortcut version of this (could loop over the possible sites adding one column at a time, but I don't feel like going quite as far as necessary down the rlang/tidy-evaluation/NSE rabbit hole right now ...)
As mentioned in the comments, this is straightforward when the data is reshaped to long format and then back to wide.
library(tidyr)
library(dplyr)
library(tibble)
df <- df %>%
rowid_to_column()
df %>%
pivot_longer(-rowid) %>%
filter(!is.na(value)) %>%
pivot_wider(id_cols = rowid, names_from = value, values_fill = FALSE, values_fn = ~ TRUE, names_sort = TRUE) %>%
left_join(df, ., by = "rowid")
# A tibble: 5 x 9
rowid primary secondary tertiary A B C D E
<int> <chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl> <lgl>
1 1 C B A TRUE TRUE TRUE FALSE FALSE
2 2 A D E TRUE FALSE FALSE TRUE TRUE
3 3 B C NA FALSE TRUE TRUE FALSE FALSE
4 4 D NA NA FALSE FALSE FALSE TRUE FALSE
5 5 NA NA NA NA NA NA NA NA
Another idea is:
ans = unclass(table(row(df), unlist(df)))
ans
# A B C D E
# 1 1 1 1 0 0
# 2 1 0 0 1 1
# 3 0 1 1 0 0
# 4 0 0 0 1 0
# 5 0 0 0 0 0
Missing values can, also, be filled where appropriate:
ans[!rowSums(ans)] = NA
ans
Related
Here is an example of the output when I execute the code chunk "is.na() function.
start_lat start_lng end_lat end_lng member_casual ride_length day_of_week X X.1 X.2
[1,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
[2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
[3,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
The "x", "x.1", and "x.2" columns are added to my dataframe and I don't know where they came from. I used na.omit function, but the columns are not recognized. In other words, they are not valid names. Can someone please help me remove these columns in my dataframe.
## figure out which columns are all NA values
all_na_cols = sapply(your_data, \(x) all(is.na(x)))
## drop them
your_data = your_data[!all_na_cols]
Running na.omit() on a data frame will drop rows if they have one or more NA values in them, so not what you want to do here.
The "x", "x.1", and "x.2" columns are added to my dataframe and I don't know where they came from.
That would worry me a lot. If I were you, I'd go back in your script and run it one line at a time until I found out where those columns came from, and then I'd solve the source of problem there rather than putting a bandage on it here.
A tidyverse solution
Using dlpyr::select()
Make some dummy data:
require(dplyr)
myData <- tibble(a = c(1,2,3,4), b = c("a", "b", "c", "d"),
c = c(NA, NA, NA, NA), d = c(NA, "not_na", "not_na", NA))
myData
#> # A tibble: 4 x 4
#> a b c d
#> <dbl> <chr> <lgl> <chr>
#> 1 1 a NA <NA>
#> 2 2 b NA not_na
#> 3 3 c NA not_na
#> 4 4 d NA <NA>
Select only the rows that are not all NA
myNewData <- select(myData, where(function(x) !all(is.na(x))))
myNewData
#> # A tibble: 4 x 3
#> a b d
#> <dbl> <chr> <chr>
#> 1 1 a <NA>
#> 2 2 b not_na
#> 3 3 c not_na
#> 4 4 d <NA>
Created on 2022-02-16 by the reprex package (v2.0.1)
I have data frame like this:
df<-tibble(id=c("ls1","ls1","ls1","ls2","ls2","ls3","ls5","ls5","ls10","ls10","ls14"),
target=c("A","A","B","G","H","A","B","B","G","HA","B"))
I would like to have a list of common values from target column within groups of id and also between groups of id variable. The result can look like something like below table:
res<-tibble(id=c("ls1","ls1","ls1","ls2","ls2","ls3","ls5","ls5","ls10","ls10","ls14"),
target=c("A","A","B","G","H","A","B","B","G","HA","B"),
withinGroup=c(T,T,F,F,F,F,F,T,T,F,F),
numberofRepwithinGroup=c(2,2,1,1,1,1,1,2,2,1,1),
betweenGroups=c(T,T,T,T,F,T,T,T,T,F,T),
numberofRepbetweenGroups=c(2,2,3,2,0,3,3,3,2,0,3))
Any idea how to do it?
You can do it with a couple of mutate():
library(dplyr)
df |>
# first group by
group_by(id, target) |>
# add the within columns
mutate(numberofRepwithinGroup = length(target),
withinGroup = ifelse(numberofRepwithinGroup > 1,T,F)) |>
# second group by
group_by(target) |>
# add the between columns
mutate(numberofRepbetweenGroups = ifelse(n_distinct(id) == 1, 0, n_distinct(id)),
betweenGroups = ifelse(numberofRepbetweenGroups > 0,T,F)) |>
# reorder columns
select(id,target, withinGroup, numberofRepwithinGroup, betweenGroups, numberofRepbetweenGroups
) |>
# remove useless grouping
ungroup()
# A tibble: 11 x 6
id target withinGroup numberofRepwithinGroup betweenGroups numberofRepbetweenGroups
<chr> <chr> <lgl> <int> <lgl> <dbl>
1 ls1 A TRUE 2 TRUE 2
2 ls1 A TRUE 2 TRUE 2
3 ls1 B FALSE 1 TRUE 3
4 ls2 G FALSE 1 TRUE 2
5 ls2 H FALSE 1 FALSE 0
6 ls3 A FALSE 1 TRUE 2
7 ls5 B TRUE 2 TRUE 3
8 ls5 B TRUE 2 TRUE 3
9 ls10 G FALSE 1 TRUE 2
10 ls10 HA FALSE 1 FALSE 0
11 ls14 B FALSE 1 TRUE 3
Here is an option
library(dplyr)
get_reps <- function(x) as.numeric(table(x)[match(x, names(table(x)))] - 1)
df %>%
group_by(id) %>%
mutate(
withinGroup = duplicated(target) | duplicated(target, fromLast = T),
numberofRepwithinGroup = get_reps(target)) %>%
ungroup() %>%
mutate(
betweenGroups = duplicated(target) | duplicated(target, fromLast = T),
numberofRepbetweenGroups = get_reps(target))
## A tibble: 11 x 6
# id target withinGroup numberofRepwithinGroup betweenGroups numberofRepbetweenGroups
# <chr> <chr> <lgl> <dbl> <lgl> <dbl>
# 1 ls1 A TRUE 1 TRUE 2
# 2 ls1 A TRUE 1 TRUE 2
# 3 ls1 B FALSE 0 TRUE 3
# 4 ls2 G FALSE 0 TRUE 1
# 5 ls2 H FALSE 0 FALSE 0
# 6 ls3 A FALSE 0 TRUE 2
# 7 ls5 B TRUE 1 TRUE 3
# 8 ls5 B TRUE 1 TRUE 3
# 9 ls10 G FALSE 0 TRUE 1
#10 ls10 HA FALSE 0 FALSE 0
#11 ls14 B FALSE 0 TRUE 3
I imported this excel sheet as a list of dataframes. I want to merge the list into one dataframe. bind_rows() allow me to easily add together the dataframes, but the issue is that I have a variable/column that has different names in each dataframe. bind_row() will by default create two separate columns, with empty values for the data from the other data frames. How can I join these columns?
Sample code:
# Sample dataframes
df1 <- tibble(A = c(1,2,3),
B = c("X","Y","Z"),
C = c(T,F,F)
)
df2 <- tibble(A = c(3,4,5),
B = c("U","V","W"),
D = c(T,T,F)
)
# List of dataframes
my_ls <- list(df1, df2)
my_ls
[[1]]
# A tibble: 3 x 3
A B C
<dbl> <chr> <lgl>
1 1 X TRUE
2 2 Y FALSE
3 3 Z FALSE
[[2]]
# A tibble: 3 x 3
A B D
<dbl> <chr> <lgl>
1 3 U TRUE
2 4 V TRUE
3 5 W FALSE
# Creating joined dataframe:
my_df <- bind_rows(my_ls)
my_df
# Current outcome: A tibble: 6 x 4
A B C D
<dbl> <chr> <lgl> <lgl>
1 1 X TRUE NA
2 2 Y FALSE NA
3 3 Z FALSE NA
4 3 U NA TRUE
5 4 V NA TRUE
6 5 W NA FALSE
The desired outcome:
# Desired outcome: A tibble: 6 x 3
A B C
<dbl> <chr> <lgl>
1 1 X TRUE
2 2 Y FALSE
3 3 Z FALSE
4 3 U TRUE
5 4 V TRUE
6 5 W FALSE
Currently, I've been using mutate() with case_when(), where I check which column is not empty (!is.na()). This works, but I can't help but think there must be an easier way.
# Example using mutate
my_df <- my_df %>%
mutate(
C = case_when(is.na(C) & !is.na(D) ~ D,
!is.na(C) & is.na(D) ~ C,
# The lines below may be a bit redundant for my purpose, since the dataframes either have the C or D variable.
!is.na(C) & !is.na(D) ~ C, # Better would be to return that variable has overlapping information
is.na(C) & is.na(D) ~ NA
)
) %>%
select(-D)
my_df
# A tibble: 6 x 3
A B C
<dbl> <chr> <lgl>
1 1 X TRUE
2 2 Y FALSE
3 3 Z FALSE
4 3 U TRUE
5 4 V TRUE
6 5 W FALSE
You can bind_rows and then select non-NA value using coalesce :
library(dplyr)
bind_rows(my_ls) %>% mutate(C = coalesce(C, D)) %>% select(A:C)
# A B C
# <dbl> <chr> <lgl>
#1 1 X TRUE
#2 2 Y FALSE
#3 3 Z FALSE
#4 3 U TRUE
#5 4 V TRUE
#6 5 W FALSE
Following the comment by #KarthikS you can rename your columns before binding. My approach using rename_with does not require the columns to be in a specific order. To illusrate this I used somewhat different example dataframes:
library(purrr)
library(dplyr)
d1 <- data.frame(A = 1, B = 2, C = 3)
d2 <- data.frame(A = 4, B = 5, D = 6)
d3 <- data.frame(D = 7, A = 8, B = 9)
d <- list(d1, d2, d3)
map(d, ~ rename_with(.x, ~ "C", matches("^D$"))) %>%
bind_rows()
#> A B C
#> 1 1 2 3
#> 2 4 5 6
#> 3 8 9 7
And now four your dataset:
d <- list(df1, df2)
map(d, ~ rename_with(.x, ~ "C", matches("^D$"))) %>%
bind_rows()
#> # A tibble: 6 x 3
#> A B C
#> <dbl> <chr> <lgl>
#> 1 1 X TRUE
#> 2 2 Y FALSE
#> 3 3 Z FALSE
#> 4 3 U TRUE
#> 5 4 V TRUE
#> 6 5 W FALSE
And if we add an addtional one with a different order:
df3 <- tibble(D = c(T,T,F),
A = c(7,8,9),
B = c("A","B","C"))
d <- list(df1, df2, df3)
map(d, ~ rename_with(.x, ~ "C", matches("^D$"))) %>%
bind_rows()
#> # A tibble: 9 x 3
#> A B C
#> <dbl> <chr> <lgl>
#> 1 1 X TRUE
#> 2 2 Y FALSE
#> 3 3 Z FALSE
#> 4 3 U TRUE
#> 5 4 V TRUE
#> 6 5 W FALSE
#> 7 7 A TRUE
#> 8 8 B TRUE
#> 9 9 C FALSE
Created on 2020-10-16 by the reprex package (v0.3.0)
Apologize for breaking out of the tidyverse for a quick answer
expl <- read.table(text= " A B C D
1 1 X TRUE NA
2 2 Y FALSE NA
3 3 Z FALSE NA
4 3 U NA TRUE
5 4 V NA TRUE
6 5 W NA FALSE")
expl$E <- ifelse(is.na(expl$C), expl$D, expl$C)
print(expl)
or maybe
expl[,c("C", "D")] %>% rowMeans(na.rm = TRUE) %>% as.logical()
EDIT: Translated the latter to tidy:
expl %>% select("C", "D") %>% rowMeans(na.rm = TRUE) %>% as.logical()
EDIT after first comment:
If you want more control you should probably write the things you want to do in each case in a function similar to the following example:
library(magrittr)
expl <- read.table(text= " A B C D
1 1 X TRUE NA
2 2 Y FALSE NA
3 3 Z FALSE NA
4 3 U NA TRUE
5 4 V NA TRUE
6 5 W NA FALSE
7 7 I NA NA
8 9 J TRUE TRUE")
myfun <- function(a, b){
if(is.na(a) & is.na(b))
return(NA)
if(!is.na(a) & !is.na(b)) {
warning("too much information, a and b set!")
return(NaN)
}
return(max(a, b, na.rm=TRUE))
}
myfun = Vectorize(myfun)
myfun(expl$C, expl$D) %>% as.logical()
I'm learning R and looking for best practices here...
Main question
Given the tibble my_tibble:
# A tibble: 5 x 3
chkA chkB chkC
<chr> <chr> <chr>
1 NA NA NA
2 x NA NA
3 NA x NA
4 x NA NA
5 NA NA x
I want to create a variable/column checked that specifies which of the variables chkA, chkB or chkC that equals "x". For each observation/row, only zero or one of those three variables can equal "x", while the rest of them are NA.
I can solve it with this code:
my_tibble <- my_tibble %>% mutate(checked = case_when(
chkA == "x" ~ "A",
chkB == "x" ~ "B",
chkC == "x" ~ "C",
TRUE ~ "(none)"
))
, which produces this tibble:
# A tibble: 5 x 4
chkA chkB chkC checked
<chr> <chr> <chr> <chr>
1 NA NA NA (none)
2 x NA NA A
3 NA x NA B
4 x NA NA A
5 NA NA x C
However, I assume/hope there could be a more convenient/elegant solution, perhaps a one-liner or something, as I think this is a frequent problem.
Bonus/follow-up question
For a "cleaner" tibble, I'd probably like to get rid of the chr variables and their NAs, by converting to lgl variables. I solved that with this code:
my_tibble <- my_tibble %>% mutate(chkA = ifelse(is.na(chkA), FALSE, TRUE))
my_tibble <- my_tibble %>% mutate(chkB = ifelse(is.na(chkB), FALSE, TRUE))
my_tibble <- my_tibble %>% mutate(chkC = ifelse(is.na(chkC), FALSE, TRUE))
, creating this tibble:
# A tibble: 5 x 3
chkA chkB chkC
<lgl> <lgl> <lgl>
1 FALSE FALSE FALSE
2 TRUE FALSE FALSE
3 FALSE TRUE FALSE
4 TRUE FALSE FALSE
5 FALSE FALSE TRUE
Is there a better way?
Here can use across with mutate for multiple column. Also, the output of is.na is logical and it can be negated (!) to return the opposite instead of using ifelse or case_when
library(dplyr)
my_tibble %>%
select(starts_with('chk')) %>%
mutate(across(everything(), ~!is.na(.)))
# A tibble: 5 x 3
# chkA chkB chkC
# <lgl> <lgl> <lgl>
#1 FALSE FALSE FALSE
#2 TRUE FALSE FALSE
#3 FALSE TRUE FALSE
#4 TRUE FALSE FALSE
#5 FALSE FALSE TRUE
Or without anonymous function call
library(purrr)
my_tibble %>%
select(starts_with('chk')) %>%
mutate(across(everything(), negate(is.na)))
# A tibble: 5 x 3
# chkA chkB chkC
# <lgl> <lgl> <lgl>
#1 FALSE FALSE FALSE
#2 TRUE FALSE FALSE
#3 FALSE TRUE FALSE
#4 TRUE FALSE FALSE
#5 FALSE FALSE TRUE
For creating the column name based on the occurrence of NA, a vectorized option is with max.col
nm1 <- sub('chk', '', names(my_tibble))[max.col(!is.na(my_tibble), 'first')]
nm1[!rowSums(!is.na(my_tibble))] <- NA_character_
my_tibble$checks <- nm1
my_tibble
# A tibble: 5 x 4
# chkA chkB chkC checks
# <chr> <chr> <chr> <chr>
#1 <NA> <NA> <NA> <NA>
#2 x <NA> <NA> A
#3 <NA> x <NA> B
#4 x <NA> <NA> A
#5 <NA> <NA> x C
An option in tidyverse to create the column would be to first rename the columns by removing the 'chk' prefix, then use imap to loop over the columns, replace the non-NA elements with column name and use coalesce to return the first non-NA value, create the 'checked' column in 'my_tibble'
library(stringr)
my_tibble <- my_tibble %>%
rename_all(~ str_remove(., 'chk')) %>%
imap_dfc(~ case_when(!is.na(.x) ~ .y)) %>%
invoke(coalesce, .) %>%
mutate(my_tibble, checked = .)
-output
my_tibble
# A tibble: 5 x 4
# chkA chkB chkC checked
# <chr> <chr> <chr> <chr>
#1 <NA> <NA> <NA> <NA>
#2 x <NA> <NA> A
#3 <NA> x <NA> B
#4 x <NA> <NA> A
#5 <NA> <NA> x C
data
my_tibble <- structure(list(chkA = c(NA, "x", NA, "x", NA), chkB = c(NA, NA,
"x", NA, NA), chkC = c(NA, NA, NA, NA, "x")), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
One dplyr option for finding the column with x could be:
df %>%
rowwise() %>%
mutate(checked = names(.)[replace(which(c_across(everything()) == "x"), length(.) == 0, NA)])
chkA chkB chkC checked
<chr> <chr> <chr> <chr>
1 <NA> <NA> <NA> <NA>
2 x <NA> <NA> chkA
3 <NA> x <NA> chkB
4 x <NA> <NA> chkA
5 <NA> <NA> x chkC
I am somewhat stuck. Is there a better way than the below to do value matching considering NAs as "real values" within mutate?
library(dplyr)
data_foo <- data.frame(A= c(1:2, NA, 4, NA), B = c(1, 3, NA, NA, 4))
Not the desired output:
data_foo %>% mutate(irr = A==B)
#> A B irr
#> 1 1 1 TRUE
#> 2 2 3 FALSE
#> 3 NA NA NA
#> 4 4 NA NA
#> 5 NA 4 NA
data_foo %>% rowwise() %>% mutate(irr = A%in%B)
#> Source: local data frame [5 x 3]
#> Groups: <by row>
#>
#> # A tibble: 5 x 3
#> A B irr
#> <dbl> <dbl> <lgl>
#> 1 1 1 TRUE
#> 2 2 3 FALSE
#> 3 NA NA FALSE
#> 4 4 NA FALSE
#> 5 NA 4 FALSE
Desired output: The below shows the desired column, irr. I am using this somewhat cumbersome helper columns. Is there a shorter way?
data_foo %>%
mutate(NA_A = is.na(A),
NA_B = is.na(B),
irr = if_else(is.na(A)|is.na(B), NA_A == NA_B, A == B))
#> A B NA_A NA_B irr
#> 1 1 1 FALSE FALSE TRUE
#> 2 2 3 FALSE FALSE FALSE
#> 3 NA NA TRUE TRUE TRUE
#> 4 4 NA FALSE TRUE FALSE
#> 5 NA 4 TRUE FALSE FALSE
Using map2
library(tidyverse)
data_foo %>%
mutate(irr = map2_lgl(A, B, `%in%`))
# A B irr
#1 1 1 TRUE
#2 2 3 FALSE
#3 NA NA TRUE
#4 4 NA FALSE
#5 NA 4 FALSE
Or with setequal
data_foo %>%
rowwise %>%
mutate(irr = setequal(A, B))
The above method is concise, but it is also loopy. We can replace the NA with a different value and then do the ==
data_foo %>%
mutate_all(list(new = ~ replace_na(., -999))) %>%
transmute(A, B, irr = A_new == B_new)
# A B irr
#1 1 1 TRUE
#2 2 3 FALSE
#3 NA NA TRUE
#4 4 NA FALSE
#5 NA 4 FALSE
Or with bind_cols and reduce
data_foo %>%
mutate_all(replace_na, -999) %>%
reduce(`==`) %>%
bind_cols(data_foo, irr = .)
Maybe simpler than akrun's answer?
Any of the two ways below will produce the expected result. Note that as.character won't do it, because the return value of as.character(NA) is NA_character_.
data_foo %>%
mutate(irr = paste(A) == paste(B))
data_foo %>%
mutate(irr = sQuote(A) == sQuote(B))
#Source: local data frame [5 x 3]
#Groups: <by row>
#
## A tibble: 5 x 3
# A B irr
# <dbl> <dbl> <lgl>
#1 1 1 TRUE
#2 2 3 FALSE
#3 NA NA TRUE
#4 4 NA FALSE
#5 NA 4 FALSE
Edit.
Following the comments below I have updated the code and it now follows akrun's suggestion.
There is also the excellent idea in tmfmnk's answer. I use a similar one in yet another way of solving the question's problem.
The documentation of all.equal says that
Do not use all.equal directly in if expressions—either use
isTRUE(all.equal(....)) or identical if appropriate.
Though there is no if expression in mutate, I believe that it is more stable than identical and has the same effect if the values being compared are (sort of/in fact) equal.
data_foo %>%
mutate(irr = isTRUE(all.equal(A, B)))
Could also be a possibility:
data_foo %>%
rowwise() %>%
mutate(irr = identical(A, B)) %>%
ungroup()
A B irr
<dbl> <dbl> <lgl>
1 1 1 TRUE
2 2 3 FALSE
3 NA NA TRUE
4 4 NA FALSE
5 NA 4 FALSE
The coalesce function is useful if you want to perform an action when a value is NA
data_foo %>%
mutate(irr = coalesce(A == B, is.na(A) & is.na(B)))
# A B irr
# 1 1 1 TRUE
# 2 2 3 FALSE
# 3 NA NA TRUE
# 4 4 NA FALSE
# 5 NA 4 FALSE
Same thing for > 2 columns
data_foo %>%
mutate(irr = coalesce(reduce(., `==`), rowMeans(is.na(.)) == 1))