How to replace NULL cell by NA in a tibble? - r

With a tibble, it is possible to have NULL cell with lists:
tibble(x = list(1L, NULL), y = 1:2)
which gives us:
# A tibble: 2 x 2
x y
<list> <int>
1 <int [1]> 1
2 <NULL> 2
that you can explore with View()
How could we replace all NULL cells of a tibble with NA?
The expected output is:
tibble(x = list(1L, NA), y = 1:2)
which produces:
# A tibble: 2 x 2
x y
<list> <int>
1 <int [1]> 1
2 <lgl [1]> 2
but is in fact:
I have tried:
is.null(df)
but it does not behave like is.na()...
Then I came with:
map(df, function(l) map(l, function(e) if(is.null(e)) NA else e))
But I struggle to make a new tibble with it:
do.call(as_tibble, map(df, function(l) map(l, function(e) if(is.null(e)) NA else e)))
that gives me an error:
Error: Columns 1 and 2 must be named.
Use .name_repair to specify repair.
Run `rlang::last_error()` to see where the error occurred.

We may use map as
library(purrr)
library(dplyr)
df %>%
mutate(across(where(is.list), map, `%||%`, NA))
# A tibble: 2 × 2
x y
<list> <int>
1 <int [1]> 1
2 <lgl [1]> 2
data
df <- tibble(x = list(1L, NULL), y = 1:2)

A tidyverse approach to achieve your desired result may look like so:
library(purrr)
library(tibble)
library(dplyr)
df <- tibble(x = list(1L, NULL), y = 1:2)
df %>%
mutate(across(where(is.list), ~ purrr::modify_if(.x, is.null, ~ NA)))
#> # A tibble: 2 × 2
#> x y
#> <list> <int>
#> 1 <int [1]> 1
#> 2 <lgl [1]> 2

Related

Remove empty lists from a tibble in R

I am trying to remove any list from my tibble that has "<chr [0]>"
library(tidyverse)
df <- tibble(x = 1:3, y = list(as.character()),
z=list(as.character("ATC"),as.character("TAC"), as.character()))
df
#> # A tibble: 3 × 3
#> x y z
#> <int> <list> <list>
#> 1 1 <chr [0]> <chr [1]>
#> 2 2 <chr [0]> <chr [1]>
#> 3 3 <chr [0]> <chr [0]>
Created on 2022-02-15 by the reprex package (v2.0.1)
I want my tibble to look like this
#> # A tibble: 3 × 3
#> x z
#> <int> <list>
#> 1 1 <chr [1]>
#> 2 2 <chr [1]>
#> 3 3 NA
any help is appreciated
You can do:
df %>%
select(where(~!all(lengths(.) == 0))) %>%
mutate(z = lapply(z, function(x) ifelse(length(x) == 0, NA, x)))
# A tibble: 3 x 2
x z
<int> <list>
1 1 <chr [1]>
2 2 <chr [1]>
3 3 <lgl [1]>
Note, in your z column you can‘t have list elemtents for row 1 and 2 and a direct logical value NA. The whole column needs to be a list.
If all elements of z have only one element, you can add another line of code with mutate(z = unlist(z)).
TO asked for a more dynamic solution to pass several columns.
Here is an example where I simply created another z2 variable. Generally, you can repeat the recoding for several columns using across.
library(tidyverse)
df <- tibble(x = 1:3, y = list(as.character()),
z=list(as.character("ATC"),as.character("TAC"), as.character()),
z2 = z)
df %>%
select(where(~!all(lengths(.) == 0))) %>%
mutate(across(starts_with('z'), ~ lapply(., function(x) ifelse(length(x) == 0, NA, x))))
Which gives:
# A tibble: 3 x 3
x z z2
<int> <list> <list>
1 1 <chr [1]> <chr [1]>
2 2 <chr [1]> <chr [1]>
3 3 <lgl [1]> <lgl [1]>
A two-step way using base R:
df <- tibble(x = 1:3, y = list(as.character()),
z=list(as.character("ATC"),as.character("TAC"), as.character()))
df <- df[apply(df, 2, function(x) any(lapply(x, length) > 0))] #Remove empty columns
df[apply(df, 2, function(x) lapply(x, length) == 0)] <- NA #Replace empty lists with NA
df
# A tibble: 3 x 2
x z
<int> <list>
1 1 <chr [1]>
2 2 <chr [1]>
3 3 <NULL>

How to transpose a list column and add the resulting list as columns?

I used a "safe" function (created with purrr::safely) to do some webscraping, adding the results to a data frame as a list column. The safe function outputs have, of course, two elements each, "result" and "error", and I'd like to convert these to data frame columns and unnest results. I can do this one at a time, but I would like a scalable solution (in case there are more columns).
Here's a simple example:
library(dplyr)
library(purrr)
foo = function(x) {
if(x == 1) stop("x cannot be 1")
data.frame(v1 = 1:2, v2 = c("a", "b"))
}
foo_safe = purrr::safely(foo)
input = data.frame(x = 0:2)
(input = input %>%
mutate(foo_output = map(x, foo_safe)))
# x foo_output
# 1 0 1, 2, a, b
# 2 1 x cannot be 1, .f(...)
# 3 2 1, 2, a, b
## desired output
input %>%
mutate(
result = map(foo_output, pluck, "result"),
error = map(foo_output, pluck, "error")
) %>%
unnest(result, keep_empty = TRUE)
# # A tibble: 5 x 5
# x foo_output v1 v2 error
# <int> <list> <int> <chr> <list>
# 1 0 <named list [2]> 1 a <NULL>
# 2 0 <named list [2]> 2 b <NULL>
# 3 1 <named list [2]> NA NA <smplErrr>
# 4 2 <named list [2]> 1 a <NULL>
# 5 2 <named list [2]> 2 b <NULL>
purrr::transpose flips the list to the right hierarchy, but there's still a challenge to get the tranposed list added as columns. as.data.frame() will fail, but as_tibble() is happy to create a data frame with list columns that we can bind to the original data:
input %>%
bind_cols(
as_tibble(transpose(.$foo_output))
) %>%
unnest(result, keep_empty = TRUE)
# # A tibble: 5 x 5
# x foo_output v1 v2 error
# <int> <list> <int> <chr> <list>
# 1 0 <named list [2]> 1 a <NULL>
# 2 0 <named list [2]> 2 b <NULL>
# 3 1 <named list [2]> NA NA <smplErrr>
# 4 2 <named list [2]> 1 a <NULL>
# 5 2 <named list [2]> 2 b <NULL>

How to extract first value from lists in data.frames columns?

This question is similar to R: How to extract a list from a dataframe?
But I could not implement it to my question in an easy way.
weird_df <- data_frame(col1 =c('hello', 'world', 'again'),col_weird = list(list(12,23), list(23,24), NA),col_weird2 = list(list(0,45), list(4,45),list(45,45.45,23)))
weird_df
# A tibble: 3 x 3
col1 col_weird col_weird2
<chr> <list> <list>
1 hello <list [2]> <list [2]>
2 world <list [2]> <list [2]>
3 again <lgl [1]> <list [3]>
>
I want in the columns col_weirdand col_weird2 to only display the first value of the current list.
col1 col_weird col_weird2
1 hello 12 0
2 world 23 4
3 again NA 45
My real problem has a lot of columns.I tried this (altered acceptend answer in posted link)
library(tidyr)
library(purrr)
weird_df %>%
mutate(col_weird = map(c(col_weird,col_weird2), toString ) ) %>%
separate(col_weird, into = c("col1"), convert = TRUE) %>%
separate(col_weird2, into = c("col2",convert = T)
One solution would be to write a simple function that extracts the first value from each list in a vector of lists . This you can then apply to the relevant columns in your data frame.
library(tibble)
#create data
weird_df <- tibble(col1 =c('hello', 'world', 'again'),
col_weird = list(list(12,23), list(23,24), NA),
col_weird2 = list(list(0,45), list(4,45), list(45,45.45,23)))
#function to extract first values from a vector of lists
fnc <- function(x) {
sapply(x, FUN = function(y) {y[[1]]})
}
#apply function to the relevant columns
weird_df[,2:3] <- apply(weird_df[,2:3], MARGIN = 2, FUN = fnc)
weird_df
# A tibble: 3 x 3
col1 col_weird col_weird2
<chr> <dbl> <dbl>
1 hello 12 0
2 world 23 4
3 again NA 45
Here is a dplyr solution
library(dplyr)
weird_df %>% mutate(across(c(col_weird, col_weird2), ~vapply(., `[[`, numeric(1L), 1L)))
Output
# A tibble: 3 x 3
col1 col_weird col_weird2
<chr> <dbl> <dbl>
1 hello 12 0
2 world 23 4
3 again NA 45

summarize to vector output

Let's say I have the following (simplified) tibble containing a group and values in vectors:
set.seed(1)
(tb_vec <- tibble(group = factor(rep(c("A","B"), c(2,3))),
values = replicate(5, sample(3), simplify = FALSE)))
# A tibble: 5 x 2
group values
<fct> <list>
1 A <int [3]>
2 A <int [3]>
3 B <int [3]>
4 B <int [3]>
5 B <int [3]>
tb_vec[[1,2]]
[1] 1 3 2
I would like to summarize the values vectors per group by summing them (vectorized) and tried the following:
tb_vec %>% group_by(group) %>%
summarize(vec_sum = colSums(purrr::reduce(values, rbind)))
Error: Column vec_sum must be length 1 (a summary value), not 3
The error surprises me, because tibbles (the output format) can contain vectors as well.
My expected output would be the following summarized tibble:
# A tibble: 2 x 2
group vec_sum
<fct> <list>
1 A <dbl [3]>
2 B <dbl [3]>
Is there a tidyverse solution accommodate the vector output of summarize? I want to avoid splitting the tibble, because then I loose the factor.
You just need to add list(.) within summarise in your solution, in order to be able to have a column with 2 elements, where each element is a vector of 3 values:
library(tidyverse)
set.seed(1)
(tb_vec <- tibble(group = factor(rep(c("A","B"), c(2,3))),
values = replicate(5, sample(3), simplify = FALSE)))
tb_vec %>%
group_by(group) %>%
summarize(vec_sum = list(colSums(purrr::reduce(values, rbind)))) -> res
res$vec_sum
# [[1]]
# [1] 2 4 6
#
# [[2]]
# [1] 6 5 7

filter list variable in dplyr

In general how do we filter by a list variable in dplyr?
E.g. a data frame where one variable is a list of different classes of object:
aa <- tibble(ss = c(1,2),
dd = list(NA,
matrix(data = c(1,2,3,4),
nrow = 2,
ncol = 2)))
> aa
# A tibble: 2 x 2
# ss dd
# <dbl> <list>
#1 1.00 <lgl [1]>
#2 2.00 <dbl [2 × 2]>
For example if I want to filter for logicals (though could be anything), if it were not a list it would be as simple as:
aa %>% filter(is.logical(dd))
But this returns
# A tibble: 0 x 2
# ... with 2 variables: ss <dbl>, dd <list>
Because it's not the first element that's a logical, it's the first element of the first element:
> is.logical(aa$dd[1])
# [1] FALSE
> is.logical(aa$dd[[1]])
# [1] TRUE
One may use purrr:map for other operations on nested list variables, but this also doesn't work.
> aa %>% filter(map(.x = dd,
+ .f = is.logical))
# Error in filter_impl(.data, quo) : basic_string::resize
What am I missing here?
As the 'dd' is a list column, we can loop through the 'dd' using map, but each element of 'dd' can have more than one element, so we make a condition that if all the elements are NA, then filter the rows of the dataset
library(tidyverse)
aa %>%
filter(map_lgl(dd, ~ .x %>%
is.na %>%
all))
# A tibble: 1 x 2
# ss dd
# <dbl> <list>
#1 1 <lgl [1]>
If this is about filtering based on class.
aa %>%
filter(map_lgl(dd, is.logical))
# A tibble: 1 x 2
# ss dd
# <dbl> <list>
#1 1 <lgl [1]>
In the OP's code, map output is still a list, we convert it to a logical vector with map_lgl
The best I can do is to create a dummy variable using is.logical with purrr:map, unlist it, filter by it, then un-select the dummy variable. Works, but what a kerfuffle.
aa %>%
mutate(ff = map(.x = dd,
.f = is.logical),
ff = unlist(ff)) %>%
filter(ff == TRUE) %>%
select(-ff)
# A tibble: 1 x 2
# ss dd
# <dbl> <list>
# 1 1.00 <lgl [1]>

Resources