purrr: using %in% with a list-column - r

I have a column of question responses and a column of possible correct_answers. I'd like to create a third (logical) column (correct) to show whether a response matches one of the possible correct answers.
I think I may need to use a purrr function but I'm not sure how to use one of the map functions with %in%, for example.
library(tibble)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(purrr)
data <- tibble(
response = c('a', 'b', 'c'),
correct_answers = rep(list(c('a', 'b')), 3)
)
# works but correct answers specified manually
data %>%
mutate(correct = response %in% c('a', 'b'))
#> # A tibble: 3 x 3
#> response correct_answers correct
#> <chr> <list> <lgl>
#> 1 a <chr [2]> TRUE
#> 2 b <chr [2]> TRUE
#> 3 c <chr [2]> FALSE
# doesn't work
data %>%
mutate(correct = response %in% correct_answers)
#> # A tibble: 3 x 3
#> response correct_answers correct
#> <chr> <list> <lgl>
#> 1 a <chr [2]> FALSE
#> 2 b <chr [2]> FALSE
#> 3 c <chr [2]> FALSE
Created on 2018-11-05 by the reprex package (v0.2.1)

%in% doesn't check nested elements inside a list, use mapply (baseR) or map2 (purrr) to loop through the columns and check:
data %>% mutate(correct = mapply(function (res, ans) res %in% ans, response, correct_answers))
# A tibble: 3 x 3
# response correct_answers correct
# <chr> <list> <lgl>
#1 a <chr [2]> TRUE
#2 b <chr [2]> TRUE
#3 c <chr [2]> FALSE
Use map2_lgl:
library(purrr)
data %>% mutate(correct = map2_lgl(response, correct_answers, ~ .x %in% .y))
# A tibble: 3 x 3
# response correct_answers correct
# <chr> <list> <lgl>
#1 a <chr [2]> TRUE
#2 b <chr [2]> TRUE
#3 c <chr [2]> FALSE
Or as #thelatemail commented, both can be simplified:
data %>% mutate(correct = mapply(`%in%`, response, correct_answers))
data %>% mutate(correct = map2_lgl(response, correct_answers, `%in%`))

Related

Remove empty lists from a tibble in R

I am trying to remove any list from my tibble that has "<chr [0]>"
library(tidyverse)
df <- tibble(x = 1:3, y = list(as.character()),
z=list(as.character("ATC"),as.character("TAC"), as.character()))
df
#> # A tibble: 3 × 3
#> x y z
#> <int> <list> <list>
#> 1 1 <chr [0]> <chr [1]>
#> 2 2 <chr [0]> <chr [1]>
#> 3 3 <chr [0]> <chr [0]>
Created on 2022-02-15 by the reprex package (v2.0.1)
I want my tibble to look like this
#> # A tibble: 3 × 3
#> x z
#> <int> <list>
#> 1 1 <chr [1]>
#> 2 2 <chr [1]>
#> 3 3 NA
any help is appreciated
You can do:
df %>%
select(where(~!all(lengths(.) == 0))) %>%
mutate(z = lapply(z, function(x) ifelse(length(x) == 0, NA, x)))
# A tibble: 3 x 2
x z
<int> <list>
1 1 <chr [1]>
2 2 <chr [1]>
3 3 <lgl [1]>
Note, in your z column you can‘t have list elemtents for row 1 and 2 and a direct logical value NA. The whole column needs to be a list.
If all elements of z have only one element, you can add another line of code with mutate(z = unlist(z)).
TO asked for a more dynamic solution to pass several columns.
Here is an example where I simply created another z2 variable. Generally, you can repeat the recoding for several columns using across.
library(tidyverse)
df <- tibble(x = 1:3, y = list(as.character()),
z=list(as.character("ATC"),as.character("TAC"), as.character()),
z2 = z)
df %>%
select(where(~!all(lengths(.) == 0))) %>%
mutate(across(starts_with('z'), ~ lapply(., function(x) ifelse(length(x) == 0, NA, x))))
Which gives:
# A tibble: 3 x 3
x z z2
<int> <list> <list>
1 1 <chr [1]> <chr [1]>
2 2 <chr [1]> <chr [1]>
3 3 <lgl [1]> <lgl [1]>
A two-step way using base R:
df <- tibble(x = 1:3, y = list(as.character()),
z=list(as.character("ATC"),as.character("TAC"), as.character()))
df <- df[apply(df, 2, function(x) any(lapply(x, length) > 0))] #Remove empty columns
df[apply(df, 2, function(x) lapply(x, length) == 0)] <- NA #Replace empty lists with NA
df
# A tibble: 3 x 2
x z
<int> <list>
1 1 <chr [1]>
2 2 <chr [1]>
3 3 <NULL>

Find differences in character column in R

I have a dataframe with ICPM codes before and after recoding of an operation.
df1 <- tibble::tribble(~ops, ~opsalt,
"8-915, 5-847.32", "5-847.32, 5-852.f3, 8-915",
"8-915, 5-781.30, 8-919, 5-807.4, 5-800.c1, 5-79b.81", "5-79b.81, 5-800.c1, 5-805.y, 5-807.4, 8-919, 5-781.30, 8-915",
"5-786.1, 5-808.a4, 5-784.1u, 5-783.2d, 5-788.5e", "5-788.5e, 5-783.2d, 5-780.4d, 5-784.7d, 5-784.1u, 5-808.a4, 5-786.1",
"8-915, 5-784.0v, 5-788.5f, 5-788.40, 5-808.b0, 5-786.k, 5-788.60, 5-788.00, 5-786.0, 5-783.2d", "5-788.00, 5-788.60, 5-786.0, 5-786.k, 5-788.40, 5-808.b0, 5-788.5f, 5-781.ad, 5-784.0v, 8-915")
I want to calculate two columns which contains the differing codes between the two columns.
For the first row the difference between ops and opsalt would be character(0).
The difference between opsalt and ops would be 5-852.f3.
Tried:
df <– df %>% mutate(ops = strsplit(ops,",")) %>%
mutate(opsalt =strsplit(opsalt,","))
df <- df %>% rowwise() %>% mutate(neu_alt = list(setdiff(ops,opsalt))) %>% mutate(alt_neu = list(setdiff(opsalt,ops)))
This didn't work, because I want to compare parts of the respective strings and not the whole string.
It should work if you use ", " in strsplit and df1 in your first mutate call.
library(dplyr)
df1 %>%
mutate(across(.fns = ~ strsplit(.x, ", "))) %>%
rowwise %>%
mutate(neu_alt = list(setdiff(ops, opsalt)),
alt_neu = list(setdiff(opsalt, ops)))
#> # A tibble: 4 x 4
#> # Rowwise:
#> ops opsalt neu_alt alt_neu
#> <list> <list> <list> <list>
#> 1 <chr [2]> <chr [3]> <chr [0]> <chr [1]>
#> 2 <chr [6]> <chr [7]> <chr [0]> <chr [1]>
#> 3 <chr [5]> <chr [7]> <chr [0]> <chr [2]>
#> 4 <chr [10]> <chr [10]> <chr [1]> <chr [1]>
Created on 2022-01-04 by the reprex package (v0.3.0)
If you want to keep them as strings, you can try this method. If you intend to do similar ops repeatedly, then I suggest retaining the list-columns (instead of repeatedly strspliting them).
df1 %>%
mutate(
d = mapply(function(...) toString(setdiff(...)),
strsplit(ops, "[ ,]+"), strsplit(opsalt, "[ ,]+"))
)
# # A tibble: 4 x 3
# ops opsalt d
# <chr> <chr> <chr>
# 1 8-915, 5-847.32 5-847.32, 5-852.f3, 8-915 ""
# 2 8-915, 5-781.30, 8-919, 5-807.4, 5-800.c1, 5-79b.81 5-79b.81, 5-800.c1, 5-805.y, 5-807.4, 8-919, 5-781.30, 8-915 ""
# 3 5-786.1, 5-808.a4, 5-784.1u, 5-783.2d, 5-788.5e 5-788.5e, 5-783.2d, 5-780.4d, 5-784.7d, 5-784.1u, 5-808.a4, 5-786.1 ""
# 4 8-915, 5-784.0v, 5-788.5f, 5-788.40, 5-808.b0, 5-786.k, 5-788.60, 5-788.00, 5-786.0, 5-783.2d 5-788.00, 5-788.60, 5-786.0, 5-786.k, 5-788.40, 5-808.b0, 5-788.5f, 5-781.ad, 5-784.0v, 8-915 "5-783.2d"
(I recommend using list-columns, though, as demonstrated in TimTeaFan's answer.)

Removing NULL values from the table

I'm having a table with me which has NUll values in a Column, those null values add to Extra Label in the Highchart graph. How to manipulate data using Dplyr to get rid of rows which has Null Values in the specific column?
I was thinking to make changes in the backend SQL queries, and filter the result for the desired output. But it is not an appropriate way.
This is not working,
dplyr::filter(!is.na(ColumnWithNullValues)) %>%
Actual code:
df <- data() %>%
dplyr::filter(CreatedBy == 'owner') %>%
dplyr::group_by(`Reason for creation`) %>%
dplyr::arrange(ReasonOrder) %>%
ColumnWithNullValues <- This column has Null values.
Here is one option with base R
df[!sapply(df$ColumnWithNullValues, is.null),]
data
library(tibble)
df <- tibble(
ColumnWithNullValues = list(c(1:5), NULL, c(6:10)))
Here is a small example df:
library(dplyr)
library(purrr)
df <- tibble(
ColumnWithNullValues = list(c(1:5), NULL, c(6:10))
)
df
#> # A tibble: 3 x 1
#> ColumnWithNullValues
#> <list>
#> 1 <int [5]>
#> 2 <NULL>
#> 3 <int [5]>
In this case the most logical might seem to use:
df %>%
filter(!is.null(ColumnWithNullValues))
#> # A tibble: 3 x 1
#> ColumnWithNullValues
#> <list>
#> 1 <int [5]>
#> 2 <NULL>
#> 3 <int [5]>
But as you can see, that does not work. Instead, we need to use map/sapply/vapply to get inside the list. For example like this:
df %>%
filter(map_lgl(ColumnWithNullValues, function(x) !all(is.null(x))))
#> # A tibble: 2 x 1
#> ColumnWithNullValues
#> <list>
#> 1 <int [5]>
#> 2 <int [5]>
But as #akrun hast explained in the comment, it is not possible that an element in the list contains NULL among other value. So we can simplify the code to this:
df %>%
filter(!map_lgl(ColumnWithNullValues, is.null))
#> # A tibble: 3 x 1
#> ColumnWithNullValues
#> <list>
#> 1 <int [5]>
#> 2 <int [5]>

Storing and calling variables in a column in dplyr within a function

I want to store some variables within a column cell within a tibble. I then want to call that column and either paste the names of those variables or call that column and paste the columns which those variables correspond to together. In addition, all of this occurs within a function and this is the only piece of hard coding left so I'd really like to find a way to solve this.
library("tidyverse")
myData<-tibble("c1"=c("a","b","c"),
"c2"=c("1","2","3"),
"c3"=c("A","B","C"),
factors=c(list(c("c1","c2")),list(c("c2","c3")),list(c("c1","c2","c3"))))
myData%>%mutate(factors1=interaction(!!!quos(factors),sep=":",lex.order=TRUE))
# A tibble: 3 x 5
c1 c2 c3 factors factors1
<chr> <chr> <chr> <list> <fct>
1 a 1 A <chr [2]> c1:c2:c1
2 b 2 B <chr [2]> c2:c3:c2
3 c 3 C <chr [3]> c1:c2:c3
So this allows me to concatenate the names of the variables but as you can see, if one list is longer than the others, it loops.
For the second problem in which I would like to use the $factors column to specifically call the values of other columns, I can hardcode this like so:
myData%>%
mutate(factors2=interaction(!!!syms(c("c1","c2")),sep=":",lex.order=TRUE))
# A tibble: 3 x 5
c1 c2 c3 factors factors2
<chr> <chr> <chr> <list> <fct>
1 a 1 A <chr [2]> a:1
2 b 2 B <chr [2]> b:2
3 c 3 C <chr [3]> c:3
However if I try this:
myData%>%
mutate(factors2=interaction(!!!syms(factors),sep=":",lex.order=TRUE))
Error in lapply(.x, .f, ...) : object 'factors' not found
The same happens if I try to unlist the factors or use other rlang expressions. I have also tried nesting rlang expressions but so far haven't found one that works as I intended.
I feel like this is something that should be possible but so far I haven't found a question on stack overflow or a tutorial that indicates that it is so maybe I'm on a wild goose chase. Thank you all for your time and help.
My code in full:
library("tidyverse")
myData<-tibble("c1"=c("a","b","c"),
"c2"=c("1","2","3"),
"c3"=c("A","B","C"),
factors=c(list(c("c1","c2")),list(c("c2","c3")),list(c("c1","c2","c3"))))%>%
mutate(factors1=interaction(!!!quos(factors),sep=":",lex.order=TRUE))%>%
mutate(factors2=interaction(!!!syms(factors),sep=":",lex.order=TRUE))
My desired output is:
# A tibble: 3 x 6
c1 c2 c3 factors factors1 factors2
<chr> <chr> <chr> <list> <fct> <fct>
1 a 1 A <chr [2]> c1:c2 a:1
2 b 2 B <chr [2]> c2:c3 2:B
3 c 3 C <chr [3]> c1:c2:c3 c:3:C
Here is a method using map and imap:
library(tidyverse)
myData %>%
mutate(factor1 = factors %>% map(~interaction(as.list(.), sep=':', lex.order = TRUE)) %>% unlist(),
factor2 = factors %>% imap(~interaction(myData[.y, match(.x, names(myData))], sep=":", lex.order = TRUE)) %>% unlist())
For factor1, instead of splicing the arguments into dots, I pass a list into interaction.
For factor2, I match factors in each row with the names in myData and uses the column index (match(.x, names(myData))) in combination with the row index (.y from imap) to subset the appropriate elements to feed into interaction.
Both factor1 and factor2 require an unlist because map and imap returns lists.
Output:
# A tibble: 3 x 6
c1 c2 c3 factors factor1 factor2
<chr> <chr> <chr> <list> <fct> <fct>
1 a 1 A <chr [2]> c1:c2 a:1
2 b 2 B <chr [2]> c2:c3 2:B
3 c 3 C <chr [3]> c1:c2:c3 c:3:C
You first question can be addressed with purrr::map and purrr::lift families of functions:
myData %>%
mutate( factors1 = map(factors, lift_dv(interaction, sep=":", lex.order=TRUE)) ) %>%
mutate_at( "factors1", lift(fct_c) )
# # A tibble: 3 x 5
# c1 c2 c3 factors factors1
# <chr> <chr> <chr> <list> <fct>
# 1 a 1 A <chr [2]> c1:c2
# 2 b 2 B <chr [2]> c2:c3
# 3 c 3 C <chr [3]> c1:c2:c3
The second question is more tricky, because !!! causes the evaluation of its argument immediately, which can sometimes lead to unintuitive operator precedence inside a dplyr chain. The cleanest way is to define a standalone function that composes your interaction expressions:
f <- function(fct) {expr( interaction(!!!syms(fct), sep=":", lex.order=TRUE) )}
# Example usage
f( myData$factors[[1]] ) # interaction(c1, c2, sep = ":", lex.order = TRUE)
f( myData$factors[[2]] ) # interaction(c2, c3, sep = ":", lex.order = TRUE)
myData %>% mutate( e = map(factors, f) )
# # A tibble: 3 x 5
# c1 c2 c3 factors e
# <chr> <chr> <chr> <list> <list>
# 1 a 1 A <chr [2]> <language>
# 2 b 2 B <chr [2]> <language>
# 3 c 3 C <chr [3]> <language>
Unfortunately, we can't evaluate e directly, because it will feed the entire columns c1, c2, and c3 to the expressions, whereas you only want a single value that is in the same row as the expression. For this reason, we need to encapsulate columns c1 through c3 in a row-wise fashion.
X <- myData %>% mutate( e = map(factors, f) ) %>%
rowwise() %>% mutate( d = list(data_frame(c1,c2,c3)) ) %>% ungroup()
# # A tibble: 3 x 6
# c1 c2 c3 factors e d
# <chr> <chr> <chr> <list> <list> <list>
# 1 a 1 A <chr [2]> <language> <tibble [1 × 3]>
# 2 b 2 B <chr [2]> <language> <tibble [1 × 3]>
# 3 c 3 C <chr [3]> <language> <tibble [1 × 3]>
Now you have expressions in e that need to be applied to data in d, so it's just a simple map2 traversal from here. Putting everything together and cleaning up, we get:
myData %>%
mutate( factors1 = map(factors, lift_dv(interaction, sep=":", lex.order=TRUE)) ) %>%
mutate( e = map(factors, f) ) %>%
rowwise() %>% mutate( d = list(data_frame(c1,c2,c3)) ) %>% ungroup() %>%
mutate( factors2 = map2( e, d, rlang::eval_tidy ) ) %>%
mutate_at( vars(factors1,factors2), lift(fct_c) ) %>%
select( -e, -d )
# # A tibble: 3 x 6
# c1 c2 c3 factors factors1 factors2
# <chr> <chr> <chr> <list> <fct> <fct>
# 1 a 1 A <chr [2]> c1:c2 a:1
# 2 b 2 B <chr [2]> c2:c3 2:B
# 3 c 3 C <chr [3]> c1:c2:c3 c:3:C

purrr::possibly function possibly not working with map2_chr function

I suspect that this is a bug in the purrr package, but would like to check my logic in StackOverflow first, please.
It seems to me that the possibly function is not working inside the map2_chr function. I'm using purrr version 0.2.5
Consider this example:
library(dplyr)
library(purrr)
lets <- tibble(posn = 2:0,
lets_list = list(letters[1:5], letters[1:5], letters[1:5])) %>%
glimpse()
returns
Observations: 3
Variables: 2
$ posn <int> 2, 1, 0
$ lets_list <list> [<"a", "b", "c", "d", "e">, <"a", "b", "c", "d", "e">, <"a", "b", "c", "d", "e">]
In this example, I want to create another column using mutate to return the element in the list "lets_list" based on the value in "posn".
lets %>%
mutate(lets_sel = map2_chr(lets_list, posn, ~.x[.y]))
fails with this error message as the third row have posn = 0.
> lets %>%
+ mutate(lets_sel = map2_chr(lets_list, posn, ~.x[.y]))
# Error in mutate_impl(.data, dots) :
# Evaluation error: Result 3 is not a length 1 atomic vector.
Using the possibly function with map2_chr returns an error too.
lets %>%
mutate(lets_sel = map2_chr(lets_list, posn, possibly(~.x[.y], NA_character_)))
# Error in mutate_impl(.data, dots) :
# Evaluation error: Result 3 is not a length 1 atomic vector.
However, the map2 function works fine:
> lets %>%
+ mutate(lets_sel = map2(lets_list, posn, possibly(~.x[.y], NA_character_)))
# A tibble: 3 x 3
posn lets_list lets_sel
<int> <list> <list>
1 2 <chr [5]> <chr [1]>
2 1 <chr [5]> <chr [1]>
3 0 <chr [5]> <chr [0]>
A workaround solution is to use map2 and then map_chr, but I suspect that this is a bug.
> lets %>%
+ mutate(lets_sel = map2(lets_list, posn, ~.x[.y]),
+ lets_sel = map_chr(lets_sel, possibly(~.x[1], NA_character_)))
# A tibble: 3 x 3
posn lets_list lets_sel
<int> <list> <chr>
1 2 <chr [5]> b
2 1 <chr [5]> a
3 0 <chr [5]> NA
Am I missing something here?
Thanks.
OK, now I'm thinking that this is just a "feature". The most elegant solution / workaround is just:
lets %>%
mutate(lets_sel = map2(lets_list, posn, ~.x[.y]) %>%
map_chr(., possibly(~.x[1], NA_character_)))
Nowhere in the help screen suggests that safely and possibly can by used with the map2 family of functions. Hence I conclude that this is a "feature" rather than a "bug".
Thanks.
possibly() doesn't work because indexing with 0 doesn't throw an error;
it just returns a length 0 vector:
nth_letter <- function(n) letters[n]
possibly(nth_letter, "not returned")(0)
#> character(0)
nth_letter(0)
#> character(0)
In this case it would probably be easier to replace invalid indices with NA
(using e.g. dplyr::na_if(), or plain old ifelse if the real problem is more complex) to get what you are after:
lets %>%
mutate(lets_sel = map2_chr(lets_list, na_if(posn, 0), ~ .x[.y]))
#> # A tibble: 3 x 3
#> posn lets_list lets_sel
#> <int> <list> <chr>
#> 1 2 <chr [5]> b
#> 2 1 <chr [5]> a
#> 3 0 <chr [5]> <NA>
Created on 2018-08-07 by the reprex package (v0.2.0.9000).

Resources