purrr::possibly function possibly not working with map2_chr function - r

I suspect that this is a bug in the purrr package, but would like to check my logic in StackOverflow first, please.
It seems to me that the possibly function is not working inside the map2_chr function. I'm using purrr version 0.2.5
Consider this example:
library(dplyr)
library(purrr)
lets <- tibble(posn = 2:0,
lets_list = list(letters[1:5], letters[1:5], letters[1:5])) %>%
glimpse()
returns
Observations: 3
Variables: 2
$ posn <int> 2, 1, 0
$ lets_list <list> [<"a", "b", "c", "d", "e">, <"a", "b", "c", "d", "e">, <"a", "b", "c", "d", "e">]
In this example, I want to create another column using mutate to return the element in the list "lets_list" based on the value in "posn".
lets %>%
mutate(lets_sel = map2_chr(lets_list, posn, ~.x[.y]))
fails with this error message as the third row have posn = 0.
> lets %>%
+ mutate(lets_sel = map2_chr(lets_list, posn, ~.x[.y]))
# Error in mutate_impl(.data, dots) :
# Evaluation error: Result 3 is not a length 1 atomic vector.
Using the possibly function with map2_chr returns an error too.
lets %>%
mutate(lets_sel = map2_chr(lets_list, posn, possibly(~.x[.y], NA_character_)))
# Error in mutate_impl(.data, dots) :
# Evaluation error: Result 3 is not a length 1 atomic vector.
However, the map2 function works fine:
> lets %>%
+ mutate(lets_sel = map2(lets_list, posn, possibly(~.x[.y], NA_character_)))
# A tibble: 3 x 3
posn lets_list lets_sel
<int> <list> <list>
1 2 <chr [5]> <chr [1]>
2 1 <chr [5]> <chr [1]>
3 0 <chr [5]> <chr [0]>
A workaround solution is to use map2 and then map_chr, but I suspect that this is a bug.
> lets %>%
+ mutate(lets_sel = map2(lets_list, posn, ~.x[.y]),
+ lets_sel = map_chr(lets_sel, possibly(~.x[1], NA_character_)))
# A tibble: 3 x 3
posn lets_list lets_sel
<int> <list> <chr>
1 2 <chr [5]> b
2 1 <chr [5]> a
3 0 <chr [5]> NA
Am I missing something here?
Thanks.

OK, now I'm thinking that this is just a "feature". The most elegant solution / workaround is just:
lets %>%
mutate(lets_sel = map2(lets_list, posn, ~.x[.y]) %>%
map_chr(., possibly(~.x[1], NA_character_)))
Nowhere in the help screen suggests that safely and possibly can by used with the map2 family of functions. Hence I conclude that this is a "feature" rather than a "bug".
Thanks.

possibly() doesn't work because indexing with 0 doesn't throw an error;
it just returns a length 0 vector:
nth_letter <- function(n) letters[n]
possibly(nth_letter, "not returned")(0)
#> character(0)
nth_letter(0)
#> character(0)
In this case it would probably be easier to replace invalid indices with NA
(using e.g. dplyr::na_if(), or plain old ifelse if the real problem is more complex) to get what you are after:
lets %>%
mutate(lets_sel = map2_chr(lets_list, na_if(posn, 0), ~ .x[.y]))
#> # A tibble: 3 x 3
#> posn lets_list lets_sel
#> <int> <list> <chr>
#> 1 2 <chr [5]> b
#> 2 1 <chr [5]> a
#> 3 0 <chr [5]> <NA>
Created on 2018-08-07 by the reprex package (v0.2.0.9000).

Related

dplyr: Why do some operations work "rowwise" without calling rowwise() and others dont?

I am still trying to figure out, how rowwise works exactly in R/dplyr.
For example I have this code:
library(dplyr)
df = data.frame(
group = c("a", "a", "a", "b", "b", "c"),
var1 = 1:6,
var2 = 7:12
)
df %>%
mutate(
concatNotRW = paste0(var1, "-", group), # work on rows
meanNotRW = mean(c(var1, var2)), # works not on rows
charsNotRW = strsplit(concatNotRW, "-") # works on rows
) %>%
rowwise() %>%
mutate(
concatRW = paste0(var1, "-", group), # all work on rows
meanRW = mean(c(var1, var2)),
charsRW = strsplit(concatRW, "-")
) -> res
The res dataframe looks like this:
group var1 var2 concatNotRW meanNotRW charsNotRW concatRW meanRW chars
<chr> <int> <int> <chr> <dbl> <list> <chr> <dbl> <list>
1 a 1 7 1-a 6.5 <chr [2]> 1-a 4 <chr [2]>
2 a 2 8 2-a 6.5 <chr [2]> 2-a 5 <chr [2]>
3 a 3 9 3-a 6.5 <chr [2]> 3-a 6 <chr [2]>
4 b 4 10 4-b 6.5 <chr [2]> 4-b 7 <chr [2]>
5 b 5 11 5-b 6.5 <chr [2]> 5-b 8 <chr [2]>
6 c 6 12 6-c 6.5 <chr [2]> 6-c 9 <chr [2]>
What I do not understand is why paste0 can take each cell of a row and pastes them together (essentially performing a rowwise-operation), yet mean can't do that. What am I missing and are there any rules on what already works rowwise without the call to rowwise() ? I did not find so much info in the rowwise()-vignette here https://dplyr.tidyverse.org/articles/rowwise.html
paste can take vectors as input in the variadic argument (...) and return the same length as vector whereas mean takes the variadic argument for other inputs (trim etc) and return a single value. Here we need rowMeans. Regarding strsplit, it returns a list of split elements
library(dplyr)
df %>%
mutate(
concatNotRW = paste0(var1, "-", group),
meanNotRW = rowMeans(across(c(var1, var2))),
charsNotRW = strsplit(concatNotRW, "-")
)
> mean(c(1:5, 6:10))
[1] 5.5
Note that the vector we are passing is a single vector by concatenating both vectors 1:5 and 6:10
whereas
> paste(1:5, 6:10)
[1] "1 6" "2 7" "3 8" "4 9" "5 10"
are two vectors passed into paste
For splitting the column into two columns, we can use separate
library(tidyr)
df %>%
mutate(
concatNotRW = paste0(var1, "-", group),
meanNotRW = rowMeans(across(c(var1, var2)))) %>%
separate(concatNotRW, into = c("ind", "chars"))
group var1 var2 ind chars meanNotRW
1 a 1 7 1 a 4
2 a 2 8 2 a 5
3 a 3 9 3 a 6
4 b 4 10 4 b 7
5 b 5 11 5 b 8
6 c 6 12 6 c 9
Why some operations work on rowwise depends on the function. If the function is vectorized, it works on the whole column and doesn't need rowwise. Here, both functions paste and mean are vectorized except that paste is vectorized for variadic input and mean is only vectorized to take a single vector and return a single value as output. Suppose, we have a function that checks each value with if/else, then it is not vectorized as if/else expects a single logical value. In that case, can use either rowwise or Vectorize the function

extracting items from list - how to account for character(0)

I am trying to extract the last element from the list nuts. In one row, however, the content is character(0). Hence, the extraction of the last element fails. I am struggling to control for the presence of character(0). Any help? Many thanks.
library(tidyverse)
my_df <- tibble(
txt=c("chestnut, pear, kiwi, peanut",
"grapes, banana"))
#Extract all nuts
my_df <- my_df %>%
mutate(nuts=str_extract_all(txt, regex("\\w*nut\\w*")))
#there were no nuts in the second row; hence character(0)
my_df$nuts
#> [[1]]
#> [1] "chestnut" "peanut"
#>
#> [[2]]
#> character(0)
#now i want to extract the last element from the list; doesn't work
my_df %>%
mutate(last_item=map_chr(nuts, ~tail(.x, 1)))
#> Error in `mutate_cols()`:
#> ! Problem with `mutate()` column `last_item`.
#> i `last_item = map_chr(nuts, ~tail(.x, 1))`.
#> x Result 2 must be a single string, not a character vector of length 0
#> Caused by error in `stop_bad_type()`:
#> ! Result 2 must be a single string, not a character vector of length 0
#the reason for the failure is the second row with character(0), the other row works,
my_df %>%
slice(., 1) %>%
mutate(last_item=map_chr(nuts, ~tail(.x, 1)))
#> # A tibble: 1 x 3
#> txt nuts last_item
#> <chr> <list> <chr>
#> 1 chestnut, pear, kiwi, peanut <chr [2]> peanut
#how to make analysis account for the presence of character(0);
#Attempt 1: purrr::possibly doesn't work either
my_df %>%
slice(., 1) %>%
mutate(last_item=map_chr(nuts, ~purrr::possibly(tail(.x, 1),
otherwise="NA")))
#> Error in `mutate_cols()`:
#> ! Problem with `mutate()` column `last_item`.
#> i `last_item = map_chr(nuts, ~purrr::possibly(tail(.x, 1), otherwise = "NA"))`.
#> x Can't coerce element 1 from a closure to a character
#> Caused by error:
#> ! Can't coerce element 1 from a closure to a character
#Attempt 2: Circumvent the issue by taking the length of the list into consideration;
#but my map - command doesn't work now.
my_df %>%
mutate(list_length=map_dbl(nuts, length)) %>%
mutate(last_item=case_when(
list_length>0 ~ ~map_chr(nuts, ~tail(.x, 1)),
list_length==0 ~ NA_character_))
#> Error in `mutate_cols()`:
#> ! Problem with `mutate()` column `last_item`.
#> i `last_item = case_when(...)`.
#> x must have class `call`, not class `formula`.
#> Caused by error in `glubort()`:
#> ! must have class `call`, not class `formula`.
Created on 2022-03-15 by the reprex package (v2.0.1)
You can do:
my_df |>
rowwise() |>
mutate(last_item = ifelse(length(nuts) == 0L, unlist(nuts), nuts[[length(nuts)]])) |>
ungroup()
# A tibble: 2 x 3
txt nuts last_item
<chr> <list> <chr>
1 chestnut, pear, kiwi, peanut <chr [2]> peanut
2 grapes, banana <chr [0]> NA
library(tidyverse)
my_df <- tibble(
txt = c(
"chestnut, pear, kiwi, peanut",
"grapes, banana"
)
) %>%
mutate(nuts = str_extract_all(txt, regex("\\w*nut\\w*")))
my_df %>%
mutate(
last_item = nuts %>% map_chr(last)
)
#> # A tibble: 2 × 3
#> txt nuts last_item
#> <chr> <list> <chr>
#> 1 chestnut, pear, kiwi, peanut <chr [2]> peanut
#> 2 grapes, banana <chr [0]> <NA>
my_df %>%
mutate(
# can not use map_chr becasue NA is not of class character
last_item = nuts %>% map(possibly(~tail(.x, 1), NA))
)
#> # A tibble: 2 × 3
#> txt nuts last_item
#> <chr> <list> <list>
#> 1 chestnut, pear, kiwi, peanut <chr [2]> <chr [1]>
#> 2 grapes, banana <chr [0]> <chr [0]>
Created on 2022-03-15 by the reprex package (v2.0.0)

How to replace NULL cell by NA in a tibble?

With a tibble, it is possible to have NULL cell with lists:
tibble(x = list(1L, NULL), y = 1:2)
which gives us:
# A tibble: 2 x 2
x y
<list> <int>
1 <int [1]> 1
2 <NULL> 2
that you can explore with View()
How could we replace all NULL cells of a tibble with NA?
The expected output is:
tibble(x = list(1L, NA), y = 1:2)
which produces:
# A tibble: 2 x 2
x y
<list> <int>
1 <int [1]> 1
2 <lgl [1]> 2
but is in fact:
I have tried:
is.null(df)
but it does not behave like is.na()...
Then I came with:
map(df, function(l) map(l, function(e) if(is.null(e)) NA else e))
But I struggle to make a new tibble with it:
do.call(as_tibble, map(df, function(l) map(l, function(e) if(is.null(e)) NA else e)))
that gives me an error:
Error: Columns 1 and 2 must be named.
Use .name_repair to specify repair.
Run `rlang::last_error()` to see where the error occurred.
We may use map as
library(purrr)
library(dplyr)
df %>%
mutate(across(where(is.list), map, `%||%`, NA))
# A tibble: 2 × 2
x y
<list> <int>
1 <int [1]> 1
2 <lgl [1]> 2
data
df <- tibble(x = list(1L, NULL), y = 1:2)
A tidyverse approach to achieve your desired result may look like so:
library(purrr)
library(tibble)
library(dplyr)
df <- tibble(x = list(1L, NULL), y = 1:2)
df %>%
mutate(across(where(is.list), ~ purrr::modify_if(.x, is.null, ~ NA)))
#> # A tibble: 2 × 2
#> x y
#> <list> <int>
#> 1 <int [1]> 1
#> 2 <lgl [1]> 2

Removing NULL values from the table

I'm having a table with me which has NUll values in a Column, those null values add to Extra Label in the Highchart graph. How to manipulate data using Dplyr to get rid of rows which has Null Values in the specific column?
I was thinking to make changes in the backend SQL queries, and filter the result for the desired output. But it is not an appropriate way.
This is not working,
dplyr::filter(!is.na(ColumnWithNullValues)) %>%
Actual code:
df <- data() %>%
dplyr::filter(CreatedBy == 'owner') %>%
dplyr::group_by(`Reason for creation`) %>%
dplyr::arrange(ReasonOrder) %>%
ColumnWithNullValues <- This column has Null values.
Here is one option with base R
df[!sapply(df$ColumnWithNullValues, is.null),]
data
library(tibble)
df <- tibble(
ColumnWithNullValues = list(c(1:5), NULL, c(6:10)))
Here is a small example df:
library(dplyr)
library(purrr)
df <- tibble(
ColumnWithNullValues = list(c(1:5), NULL, c(6:10))
)
df
#> # A tibble: 3 x 1
#> ColumnWithNullValues
#> <list>
#> 1 <int [5]>
#> 2 <NULL>
#> 3 <int [5]>
In this case the most logical might seem to use:
df %>%
filter(!is.null(ColumnWithNullValues))
#> # A tibble: 3 x 1
#> ColumnWithNullValues
#> <list>
#> 1 <int [5]>
#> 2 <NULL>
#> 3 <int [5]>
But as you can see, that does not work. Instead, we need to use map/sapply/vapply to get inside the list. For example like this:
df %>%
filter(map_lgl(ColumnWithNullValues, function(x) !all(is.null(x))))
#> # A tibble: 2 x 1
#> ColumnWithNullValues
#> <list>
#> 1 <int [5]>
#> 2 <int [5]>
But as #akrun hast explained in the comment, it is not possible that an element in the list contains NULL among other value. So we can simplify the code to this:
df %>%
filter(!map_lgl(ColumnWithNullValues, is.null))
#> # A tibble: 3 x 1
#> ColumnWithNullValues
#> <list>
#> 1 <int [5]>
#> 2 <int [5]>

purrr: using %in% with a list-column

I have a column of question responses and a column of possible correct_answers. I'd like to create a third (logical) column (correct) to show whether a response matches one of the possible correct answers.
I think I may need to use a purrr function but I'm not sure how to use one of the map functions with %in%, for example.
library(tibble)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(purrr)
data <- tibble(
response = c('a', 'b', 'c'),
correct_answers = rep(list(c('a', 'b')), 3)
)
# works but correct answers specified manually
data %>%
mutate(correct = response %in% c('a', 'b'))
#> # A tibble: 3 x 3
#> response correct_answers correct
#> <chr> <list> <lgl>
#> 1 a <chr [2]> TRUE
#> 2 b <chr [2]> TRUE
#> 3 c <chr [2]> FALSE
# doesn't work
data %>%
mutate(correct = response %in% correct_answers)
#> # A tibble: 3 x 3
#> response correct_answers correct
#> <chr> <list> <lgl>
#> 1 a <chr [2]> FALSE
#> 2 b <chr [2]> FALSE
#> 3 c <chr [2]> FALSE
Created on 2018-11-05 by the reprex package (v0.2.1)
%in% doesn't check nested elements inside a list, use mapply (baseR) or map2 (purrr) to loop through the columns and check:
data %>% mutate(correct = mapply(function (res, ans) res %in% ans, response, correct_answers))
# A tibble: 3 x 3
# response correct_answers correct
# <chr> <list> <lgl>
#1 a <chr [2]> TRUE
#2 b <chr [2]> TRUE
#3 c <chr [2]> FALSE
Use map2_lgl:
library(purrr)
data %>% mutate(correct = map2_lgl(response, correct_answers, ~ .x %in% .y))
# A tibble: 3 x 3
# response correct_answers correct
# <chr> <list> <lgl>
#1 a <chr [2]> TRUE
#2 b <chr [2]> TRUE
#3 c <chr [2]> FALSE
Or as #thelatemail commented, both can be simplified:
data %>% mutate(correct = mapply(`%in%`, response, correct_answers))
data %>% mutate(correct = map2_lgl(response, correct_answers, `%in%`))

Resources