Ensure that data frames become tibbles when reading MongoDB data with {mongolite}

Ensure that data frames become tibbles when reading MongoDB data with {mongolite} - r

I have to deal with JSON documents that contain nested documents and at some level have an array which in turn contains individual documents that conceptionally would map back to a "data frame row" when reading/parsing the JSON in R.
How can I ensure that all data frames are casted into tibbles when
retrieving data from the database?
Desired result for example data below
Desired result
query_res$levelOne <- query_res$levelOne %>% tibble::as_tibble()
query_res$levelOne$levelTwo <- query_res$levelOne$levelTwo %>%
tibble::as_tibble()
query_res$levelOne$levelTwo$levelThree <- query_res$levelOne$levelTwo$levelThree %>%
purrr::map(tibble::as_tibble)
query_res %>% str()
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 3 variables:
# $ labels :List of 2
# ..$ : chr "label-a" "label-b"
# ..$ : chr "label-a" "label-b"
# $ levelOne:Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 1 variable:
# ..$ levelTwo:Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 1 variable:
# .. ..$ levelThree:List of 2
# .. .. ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 3 variables:
# .. .. .. ..$ x: chr "A" "B"
# .. .. .. ..$ y: int 1 2
# .. .. .. ..$ z: logi TRUE FALSE
# .. .. ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 3 variables:
# .. .. .. ..$ x: chr "A" "B"
# .. .. .. ..$ y: int 10 20
# .. .. .. ..$ z: logi FALSE TRUE
# $ schema : chr "0.0.1" "0.0.1"
If I try to do that via dplyr::mutate() or purrr::map*_df(), I get the Error: Columnis of unsupported class data.frame error.
Related post
Recursively ensuring tibbles instead of data frames when parsing/manipulating nested JSON
Example
JSON data to put into file dump.json
{"labels": ["label-a", "label-b"],"levelOne": {"levelTwo": {"levelThree": [{"x": "A","y": 1,"z": true},{"x": "B","y": 2,"z": false}]}},"schema": "0.0.1"}
{"labels": ["label-a", "label-b"],"levelOne": {"levelTwo": {"levelThree": [{"x": "A","y": 10,"z": false},{"x": "B","y": 20,"z": true}]}},"schema": "0.0.1"}
Importing JSON into MongoDB
con <- mongolite::mongo(
db = "stackoverflow",
collection = "nested_json"
)
con$import(file("dump.json"))
This is what you should see within MongoDB
Query via $find()
query_res <- con$find() %>%
tibble::as_tibble()
query_res %>% str()
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 3 variables:
# $ labels :List of 2
# ..$ : chr "label-a" "label-b"
# ..$ : chr "label-a" "label-b"
# $ levelOne:'data.frame': 2 obs. of 1 variable:
# ..$ levelTwo:'data.frame': 2 obs. of 1 variable:
# .. ..$ levelThree:List of 2
# .. .. ..$ :'data.frame': 2 obs. of 3 variables:
# .. .. .. ..$ x: chr "A" "B"
# .. .. .. ..$ y: int 1 2
# .. .. .. ..$ z: logi TRUE FALSE
# .. .. ..$ :'data.frame': 2 obs. of 3 variables:
# .. .. .. ..$ x: chr "A" "B"
# .. .. .. ..$ y: int 10 20
# .. .. .. ..$ z: logi FALSE TRUE
# $ schema : chr "0.0.1" "0.0.1"
Query via $iterate()
it <- con$iterate()
iter_res <- list()
while(!is.null(x <- it$one())) {
# Ensure array columns stay individual list columns when casting to tibble:
# (As opposed to multiple array items being turned into one tibble row)
p <- function(x) {
is.list(x) &&
is.null(names(x))
}
f <- function(x) {
list(x %>% unlist())
}
x <- x %>% purrr::map_if(p, f)
# Necessary to get the `simplifyVector = TRUE` effect:
iter_res_current <- x %>%
jsonlite:::simplify() %>%
tibble::as_tibble()
# Combine with previous iteration results:
iter_res <- c(iter_res, list(iter_res_current))
}
iter_res_df <- iter_res %>%
dplyr::bind_rows()
iter_res_df %>% str()
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 3 variables:
# $ labels :List of 2
# ..$ : chr "label-a" "label-b"
# ..$ : chr "label-a" "label-b"
# $ levelOne:List of 2
# ..$ :List of 1
# .. ..$ levelThree:'data.frame': 2 obs. of 3 variables:
# .. .. ..$ x: chr "A" "B"
# .. .. ..$ y: int 1 2
# .. .. ..$ z: logi TRUE FALSE
# ..$ :List of 1
# .. ..$ levelThree:'data.frame': 2 obs. of 3 variables:
# .. .. ..$ x: chr "A" "B"
# .. .. ..$ y: int 10 20
# .. .. ..$ z: logi FALSE TRUE
# $ schema : chr "0.0.1" "0.0.1"

Related

handling lists in lists to Dataframe in R

I´m new and i have some problems handling list and transform to dataframe
I have a list "ddt"
str(ddt)
List of 4
$ id : chr "18136"
$ comments.data:List of 3
..$ :List of 3
.. ..$ timestamp: chr "2020-05-25T16:17:32+0000"
.. ..$ text : chr "Mocaaa"
.. ..$ id : chr "18096"
..$ :List of 3
.. ..$ timestamp: chr "2020-05-25T16:00:00+0000"
.. ..$ text : chr "Capucchino"
.. ..$ id : chr "17846"
..$ :List of 3
.. ..$ timestamp: chr "2020-05-25T14:42:53+0000"
.. ..$ text : chr "Mocachino"
.. ..$ id : chr "18037"
$ id : chr "17920"
$ comments.data:List of 1
..$ :List of 3
.. ..$ timestamp: chr "2020-05-24T15:31:30+0000"
.. ..$ text : chr "Hello"
.. ..$ id : chr "18054"
And i need this result
id timestamp text id2
1 18136 2020-05-25T16:17:32+0000 Mocaaa 18096
2 18136 2020-05-25T16:00:00+0000 Capucchino 17846
3 18136 2020-05-25T14:42:53+0000 Mocachino 18037
4 17920 2020-05-24T15:31:30+0000 Hello 18054

I think this can be done well with data.table.
set.seed(42)
df <- replicate(2, list(id = sample(1e5, 1), comments = replicate(3, list(tm = as.character(Sys.time() + sample(10, 1)), text = sample(LETTERS, 1), id = sample(1e5, 1)), simplify = FALSE)), simplify = FALSE)
str(df)
# List of 2
# $ :List of 2
# ..$ id : int 91481
# ..$ comments:List of 3
# .. ..$ :List of 3
# .. .. ..$ tm : chr "2020-05-26 14:44:08"
# .. .. ..$ text: chr "H"
# .. .. ..$ id : int 83045
# .. ..$ :List of 3
# .. .. ..$ tm : chr "2020-05-26 14:44:05"
# .. .. ..$ text: chr "N"
# .. .. ..$ id : int 73659
# .. ..$ :List of 3
# .. .. ..$ tm : chr "2020-05-26 14:44:00"
# .. .. ..$ text: chr "R"
# .. .. ..$ id : int 70507
# $ :List of 2
# ..$ id : int 45775
# ..$ comments:List of 3
# .. ..$ :List of 3
# .. .. ..$ tm : chr "2020-05-26 14:44:06"
# .. .. ..$ text: chr "Y"
# .. .. ..$ id : int 25543
# .. ..$ :List of 3
# .. .. ..$ tm : chr "2020-05-26 14:44:03"
# .. .. ..$ text: chr "Y"
# .. .. ..$ id : int 97823
# .. ..$ :List of 3
# .. .. ..$ tm : chr "2020-05-26 14:44:00"
# .. .. ..$ text: chr "M"
# .. .. ..$ id : int 56034
One thing we'll have to contend with is that you have id on the top-level as well as internally within each list.
library(data.table)
library(magrittr) # for %>%, demonstrative only, can be done without
data.table::rbindlist(df) %>%
.[, comments := lapply(comments, as.data.table) ] %>%
# we have a duplicate name 'id', rename in the inner ones
.[, comments := lapply(comments, setnames, "id", "innerid") ] %>%
.[, unlist(comments, recursive = FALSE), by = seq_len(nrow(.)) ]
# seq_len tm text innerid
# 1: 1 2020-05-26 14:49:21 H 83045
# 2: 2 2020-05-26 14:49:18 N 73659
# 3: 3 2020-05-26 14:49:13 R 70507
# 4: 4 2020-05-26 14:49:19 Y 25543
# 5: 5 2020-05-26 14:49:16 Y 97823
# 6: 6 2020-05-26 14:49:13 M 56034
I suspect that the by=seq_len(nrow(.)) is not going to scale well to larger data. Since Rdatatable/data.table#3672 is still open, an alternative is to replace the last line (including unlist and seq_len) with just %>% tidyr::unnest(comments). I suspect that the combination of data.table and tidyr is at times contentious, I suggest that this non-partisan approach capitalizes on the strengths of both.

The structure seems to look just like a java script object.
You could do:
library(jsonlite)
library(tidyr)
unnest(unnest(fromJSON(toJSON(df))))
# A tibble: 6 x 4
id tm text id1
<int> <chr> <chr> <int>
1 92345 2020-05-26 14:53:53 X 6730
2 92345 2020-05-26 14:53:56 Q 92812
3 92345 2020-05-26 14:53:56 D 25304
4 9847 2020-05-26 14:53:56 E 82734
5 9847 2020-05-26 14:54:01 I 75079
6 9847 2020-05-26 14:54:02 H 89373

Tibble columns of class tibble instead of class data frame

What's the tidy way of having tibble columns of class tibble (instead of class list or data.frame)?
It's clearly possible to have columns of class data.frame in tibbles (see
example below), but none of the "tidy ways of data manipulation" (i.e.
dplyr::mutate() or purrr::map*_df()) seem to work for me when trying to cast the columns to tibble instead of data.frame
Current ouput of jsonlite::fromJSON()
# 'data.frame': 2 obs. of 3 variables:
# $ labels :List of 2
# ..$ : chr "label-a" "label-b"
# ..$ : chr "label-a" "label-b"
# $ levelOne:'data.frame': 2 obs. of 1 variable:
# ..$ levelTwo:'data.frame': 2 obs. of 1 variable:
# .. ..$ levelThree:List of 2
# .. .. ..$ :'data.frame': 2 obs. of 3 variables:
# .. .. .. ..$ x: chr "A" "B"
# .. .. .. ..$ y: int 1 2
# .. .. .. ..$ z: logi TRUE FALSE
# .. .. ..$ :'data.frame': 2 obs. of 3 variables:
# .. .. .. ..$ x: chr "A" "B"
# .. .. .. ..$ y: int 10 20
# .. .. .. ..$ z: logi FALSE TRUE
# $ schema : chr "0.0.1" "0.0.1"
Desired result
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 3 variables:
# $ labels :List of 2
# ..$ : chr "label-a" "label-b"
# ..$ : chr "label-a" "label-b"
# $ levelOne:Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 1 variable:
# ..$ levelTwo:Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 1 variable:
# .. ..$ levelThree:List of 2
# .. .. ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 3 variables:
# .. .. .. ..$ x: chr "A" "B"
# .. .. .. ..$ y: int 1 2
# .. .. .. ..$ z: logi TRUE FALSE
# .. .. ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 3 variables:
# .. .. .. ..$ x: chr "A" "B"
# .. .. .. ..$ y: int 10 20
# .. .. .. ..$ z: logi FALSE TRUE
# $ schema : chr "0.0.1" "0.0.1"
Why having data.frame columns can be very misleading
https://hendrikvanb.gitlab.io/2018/07/nested_data-json_to_tibble/
Related
Recursively ensuring tibbles instead of data frames when parsing/manipulating nested JSON
Ensure that data frames become tibbles when reading MongoDB data with {mongolite}
Example
Example data
library(magrittr)
json <- '[
{
"labels": ["label-a", "label-b"],
"levelOne": {
"levelTwo": {
"levelThree": [
{
"x": "A",
"y": 1,
"z": true
},
{
"x": "B",
"y": 2,
"z": false
}
]
}
},
"schema": "0.0.1"
},
{
"labels": ["label-a", "label-b"],
"levelOne": {
"levelTwo": {
"levelThree": [
{
"x": "A",
"y": 10,
"z": false
},
{
"x": "B",
"y": 20,
"z": true
}
]
}
},
"schema": "0.0.1"
}
]'
When visualizing this, you'll see that there's a subtle but important distinction between objects (which map to data.frames) and array (which map to lists):
Parsing JSON and converting to tibble
x <- json %>%
jsonlite::fromJSON() %>%
tibble::as_tibble()
x %>% str()
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 3 variables:
# $ labels :List of 2
# ..$ : chr "label-a" "label-b"
# ..$ : chr "label-a" "label-b"
# $ levelOne:'data.frame': 2 obs. of 1 variable:
# ..$ levelTwo:'data.frame': 2 obs. of 1 variable:
# .. ..$ levelThree:List of 2
# .. .. ..$ :'data.frame': 2 obs. of 3 variables:
# .. .. .. ..$ x: chr "A" "B"
# .. .. .. ..$ y: int 1 2
# .. .. .. ..$ z: logi TRUE FALSE
# .. .. ..$ :'data.frame': 2 obs. of 3 variables:
# .. .. .. ..$ x: chr "A" "B"
# .. .. .. ..$ y: int 10 20
# .. .. .. ..$ z: logi FALSE TRUE
# $ schema : chr "0.0.1" "0.0.1"
So it's clearly possible to have columns that are of class data.frame.
Casting data.frame to tibble columns: "the bad way"
But I'd like tibbles instead of data frames, so let's try the only thing I got
to work: explicit re-assigning the respective list levels, or data frame/tibble
columns, to be more precise:
# Make a copy so we don't mess with the initial state of `x`
y <- x
y$levelOne <- y$levelOne %>%
tibble::as_tibble()
y$levelOne$levelTwo <- y$levelOne$levelTwo %>%
tibble::as_tibble()
y$levelOne$levelTwo$levelThree <- y$levelOne$levelTwo$levelThree %>%
purrr::map(tibble::as_tibble)
x %>% str()
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 3 variables:
# $ labels :List of 2
# ..$ : chr "label-a" "label-b"
# ..$ : chr "label-a" "label-b"
# $ levelOne:Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 1 variable:
# ..$ levelTwo:Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 1 variable:
# .. ..$ levelThree:List of 2
# .. .. ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 3 variables:
# .. .. .. ..$ x: chr "A" "B"
# .. .. .. ..$ y: int 1 2
# .. .. .. ..$ z: logi TRUE FALSE
# .. .. ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 3 variables:
# .. .. .. ..$ x: chr "A" "B"
# .. .. .. ..$ y: int 10 20
# .. .. .. ..$ z: logi FALSE TRUE
# $ schema : chr "0.0.1" "0.0.1"
That works, but is not in line with "tidy data manipulation pipes".
Casting data.frame to tibble columns: "the better way" (trying and failing)
# Yet another copy so we can compare:
z <- x
# Just to check that this works
z$levelOne %>%
tibble::as_tibble()
# # A tibble: 2 x 1
# levelTwo$levelThree
# <list>
# 1 <df[,3] [2 × 3]>
# 2 <df[,3] [2 × 3]>
# Trying to get this to work with `dplzr::mutate()` fails:
z %>%
dplyr::mutate(levelOne = levelOne %>%
tibble::as_tibble()
)
# Error: Column `levelOne` is of unsupported class data.frame
z %>%
dplyr::transmute(levelOne = levelOne %>%
tibble::as_tibble()
)
# Error: Column `levelOne` is of unsupported class data.frame
# Same goes for `{purrr}`:
z %>%
dplyr::mutate(levelOne = levelOne %>%
purrr::map_df(tibble::as_tibble)
)
# Error: Column `levelOne` is of unsupported class data.frame
z %>%
tibble::add_column(levelOne = z$levelOne %>% tibble::as_tibble())
# Error: Can't add duplicate columns with `add_column()`:
# * Column `levelOne` already exists in `.data`.
# Works, but not what I want:
z %>%
tibble::add_column(test = z$levelOne %>% tibble::as_tibble()) %>%
str()
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 4 variables:
# [...]
# $ test :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 1 variable:
# ..$ levelTwo:'data.frame': 2 obs. of 1 variable:
# .. ..$ levelThree:List of 2
# .. .. ..$ :'data.frame': 2 obs. of 3 variables:
# .. .. .. ..$ x: chr "A" "B"
# .. .. .. ..$ y: int 1 2
# .. .. .. ..$ z: logi TRUE FALSE
# .. .. ..$ :'data.frame': 2 obs. of 3 variables:
# .. .. .. ..$ x: chr "A" "B"
# .. .. .. ..$ y: int 10 20
# .. .. .. ..$ z: logi FALSE TRUE
The only thing that worked (is not what we want)
Wrapping tibble::as_tibble() by purrr::map() seems to work, but the result is clearly not what we want as we duplicate everything below levelOne (compare to desired output above)
# Works, but not what I want:
z_new <- z %>%
dplyr::mutate(levelOne = levelOne %>%
purrr::map(tibble::as_tibble)
)
z_new %>% str()
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 3 variables:
# $ labels :List of 2
# ..$ : chr "label-a" "label-b"
# ..$ : chr "label-a" "label-b"
# $ levelOne:List of 2
# ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 1 variable:
# .. ..$ levelThree:List of 2
# .. .. ..$ :'data.frame': 2 obs. of 3 variables:
# .. .. .. ..$ x: chr "A" "B"
# .. .. .. ..$ y: int 1 2
# .. .. .. ..$ z: logi TRUE FALSE
# .. .. ..$ :'data.frame': 2 obs. of 3 variables:
# .. .. .. ..$ x: chr "A" "B"
# .. .. .. ..$ y: int 10 20
# .. .. .. ..$ z: logi FALSE TRUE
# ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 1 variable:
# .. ..$ levelThree:List of 2
# .. .. ..$ :'data.frame': 2 obs. of 3 variables:
# .. .. .. ..$ x: chr "A" "B"
# .. .. .. ..$ y: int 1 2
# .. .. .. ..$ z: logi TRUE FALSE
# .. .. ..$ :'data.frame': 2 obs. of 3 variables:
# .. .. .. ..$ x: chr "A" "B"
# .. .. .. ..$ y: int 10 20
# .. .. .. ..$ z: logi FALSE TRUE
# $ schema : chr "0.0.1" "0.0.1"
EDIT (follow-up investigation)
Got it to work with Hendrik's help!
Still, IMO this topic raises some interesting follow-up questions regarding
whether or not one should - or even could - do it any other way if the primary
goal is to end up with tidy nested tibbles that play nicely with
tidyr::unnset() and tidyr::nest() (see comments in Hendrik's answer below).
As to the proposed approach in
https://hendrikvanb.gitlab.io/2018/07/nested_data-json_to_tibble/: I might be
overlooking something obvious, but I think it only works for JSON docs with a
single document.
First, let's modify df_to_tibble() (see Hendrik's answer below) to only turn
"leaf" data frames into tibbles while turning "branch" data frames into lists:
leaf_df_to_tibble <- function(x) {
if (is.data.frame(x)) {
if (!any(purrr::map_lgl(x, is.list))) {
# Only captures "leaf" DFs:
tibble::as_tibble(x)
} else {
as.list(x)
}
} else {
x
}
}
This would give us results that are in line with the proposed way in the blog post, but only for "single object" JSON docs as illustrated below
df <- json %>% jsonlite::fromJSON()
# Only take the first object from the parsed JSON:
df_subset <- df[1, ]
Transforming df_subset:
df_subset_tibble <- purrr::reduce(
0:purrr::vec_depth(df_subset),
function(x, depth) {
purrr::modify_depth(x, depth, leaf_df_to_tibble, .ragged = TRUE)
},
.init = df_subset
) %>%
tibble::as_tibble()
df_subset_tibble %>% str()
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1 obs. of 3 variables:
# $ labels :List of 1
# ..$ : chr "label-a" "label-b"
# $ levelOne:List of 1
# ..$ levelTwo:List of 1
# .. ..$ levelThree:List of 1
# .. .. ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 3 variables:
# .. .. .. ..$ x: chr "A" "B"
# .. .. .. ..$ y: int 1 2
# .. .. .. ..$ z: logi TRUE FALSE
# $ schema : chr "0.0.1"
Transforming df:
df_tibble <- purrr::reduce(
0:purrr::vec_depth(df),
function(x, depth) {
purrr::modify_depth(x, depth, leaf_df_to_tibble, .ragged = TRUE)
},
.init = df
) %>%
tibble::as_tibble()
df_tibble %>% str()
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 3 variables:
# $ labels :List of 2
# ..$ : chr "label-a" "label-b"
# ..$ : chr "label-a" "label-b"
# $ levelOne:List of 2
# ..$ levelTwo:List of 1
# .. ..$ levelThree:List of 2
# .. .. ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 3 variables:
# .. .. .. ..$ x: chr "A" "B"
# .. .. .. ..$ y: int 1 2
# .. .. .. ..$ z: logi TRUE FALSE
# .. .. ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 3 variables:
# .. .. .. ..$ x: chr "A" "B"
# .. .. .. ..$ y: int 10 20
# .. .. .. ..$ z: logi FALSE TRUE
# ..$ levelTwo:List of 1
# .. ..$ levelThree:List of 2
# .. .. ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 3 variables:
# .. .. .. ..$ x: chr "A" "B"
# .. .. .. ..$ y: int 1 2
# .. .. .. ..$ z: logi TRUE FALSE
# .. .. ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 3 variables:
# .. .. .. ..$ x: chr "A" "B"
# .. .. .. ..$ y: int 10 20
# .. .. .. ..$ z: logi FALSE TRUE
# $ schema : chr "0.0.1" "0.0.1"
As we see, "listifying" nested JSON structures actually may results in copying
the "leafs". It just doesn't jump at you as long as n = 1 (number of JSON
docs), but strikes you as soon as n > 1.

Background
The comments above raise some valid points. Still, I do believe there is a way to achieve what you're after (whether or not this is a particularly good idea is less clear) by leveraging three functions from the purrr package in combination:
purrr::vec_depth allows us to get the (nesting) depth of a given list,
purrr::modify_depth allows us to apply a function to an list at the specified level of depth, and
purrr::reduce allows us to iteratively apply a function and have the result of each iteration be passed as the input to the subsequent iteration.
General approach
In essence, we want to convert any data.frame found at any level in the list to a tibble. This can easily be achieved using several rounds of purrr::modify_depth where we simply alter the depth depending on the level of the list we wish to target. Crucially, however, we want to do this in a way so that changes to level 1, for example, are retained when we move on to targeting level 2; changes to level 1 and 2 are retained when we move on to level 3; and so on. This is where purrr::reduce comes in: each time we apply purrr::modify_depth to convert a data.frame to a tibble, we'll ensure that the resultant output gets passed as the input to the next iteration. This is illustrated in the MWE below
MWE
Start with the basic setup of data structures and libraries
#> Load libraries ----
library(tidyverse)
json <- '[
{
"labels": ["label-a", "label-b"],
"levelOne": {
"levelTwo": {
"levelThree": [
{
"x": "A",
"y": 1,
"z": true
},
{
"x": "B",
"y": 2,
"z": false
}
]
}
},
"schema": "0.0.1"
},
{
"labels": ["label-a", "label-b"],
"levelOne": {
"levelTwo": {
"levelThree": [
{
"x": "A",
"y": 10,
"z": false
},
{
"x": "B",
"y": 20,
"z": true
}
]
}
},
"schema": "0.0.1"
}
]'
# convert json to a nested data.frame
df <- jsonlite::fromJSON(json)
Now we'll create a simple helper function that can conditionally convert data.frame to tibble
# define a simple function to convert data.frame to tibble
df_to_tibble <- function(x) {
if (is.data.frame(x)) as_tibble(x) else x
}
Now for the crucial routine: Taking df as the initial starting point (.init = df), apply the df_to_tibble function at each level of df (0:purrr::vec_depth(df)) using purrr::modify_depth. Use purrr::reduce to ensure that the results from each individual iteration gets passed as the input to the subsequent iteration.
# create df_tibble by reducing the result of applying df_to_tibble to each level
# of df via purrr's modify_depth function %>% lastly, ensure that the top level
# data.frame is also converted to a tibble
df_tibble <- purrr::reduce(
0:purrr::vec_depth(df),
function(x, depth) {
purrr::modify_depth(x, depth, df_to_tibble, .ragged = TRUE)
},
.init = df
) %>%
as_tibble()
# show the structure of df_tibble
str(df_tibble)
#> Classes 'tbl_df', 'tbl' and 'data.frame': 2 obs. of 3 variables:
#> $ labels :List of 2
#> ..$ : chr "label-a" "label-b"
#> ..$ : chr "label-a" "label-b"
#> $ levelOne:Classes 'tbl_df', 'tbl' and 'data.frame': 2 obs. of 1 variable:
#> ..$ levelTwo:Classes 'tbl_df', 'tbl' and 'data.frame': 2 obs. of 1 variable:
#> .. ..$ levelThree:List of 2
#> .. .. ..$ :Classes 'tbl_df', 'tbl' and 'data.frame': 2 obs. of 3 variables:
#> .. .. .. ..$ x: chr "A" "B"
#> .. .. .. ..$ y: int 1 2
#> .. .. .. ..$ z: logi TRUE FALSE
#> .. .. ..$ :Classes 'tbl_df', 'tbl' and 'data.frame': 2 obs. of 3 variables:
#> .. .. .. ..$ x: chr "A" "B"
#> .. .. .. ..$ y: int 10 20
#> .. .. .. ..$ z: logi FALSE TRUE
#> $ schema : chr "0.0.1" "0.0.1"

Insert elements into a list based on depth and `if` conditions using modify_depth and modify_if (purrr)

I'm learning some purrr commands, specifically the modify_* family of functions. I'm attemping to add price bins to items found in a grocery store (see below for my attempt and error code).
library(tidyverse)
Data
easybuy <- list(
"5520 N Division St, Spokane, WA 99208, USA",
list("bananas", "oranges"),
canned = list("olives", "fish", "jam"),
list("pork", "beef"),
list("hammer", "tape")
) %>%
map(list) %>%
# name the sublists
set_names(c("address",
"fruit",
"canned",
"meat",
"other")) %>%
# except for address, names the sublists "items"
modify_at(c(2:5), ~ set_names(.x, "items"))
Take a peek:
glimpse(easybuy)
#> List of 5
#> $ address:List of 1
#> ..$ : chr "5520 N Division St, Spokane, WA 99208, USA"
#> $ fruit :List of 1
#> ..$ items:List of 2
#> .. ..$ : chr "bananas"
#> .. ..$ : chr "oranges"
#> $ canned :List of 1
#> ..$ items:List of 3
#> .. ..$ : chr "olives"
#> .. ..$ : chr "fish"
#> .. ..$ : chr "jam"
#> $ meat :List of 1
#> ..$ items:List of 2
#> .. ..$ : chr "pork"
#> .. ..$ : chr "beef"
#> $ other :List of 1
#> ..$ items:List of 2
#> .. ..$ : chr "hammer"
#> .. ..$ : chr "tape"
My Attempt
Idea: go in a depth of two, and look for "items", append a "price". I'm not sure if I can nest the modify functions like this.
easybuy %>%
modify_depth(2, ~ modify_at(., "items", ~ append("price")))
#> Error: character indexing requires a named object
Desired
I would like the following structure (note the addition of "price" under each item):
List of 5
$ address:List of 1
..$ : chr "5520 N Division St, Spokane, WA 99208, USA"
$ fruit :List of 1
..$ items:List of 2
.. ..$ :List of 2
.. .. ..$ : chr "bananas"
.. .. ..$ : chr "price"
.. ..$ :List of 2
.. .. ..$ : chr "oranges"
.. .. ..$ : chr "price"
$ canned :List of 1
..$ items:List of 3
.. ..$ :List of 2
.. .. ..$ : chr "olives"
.. .. ..$ : chr "price"
.. ..$ :List of 2
.. .. ..$ : chr "fish"
.. .. ..$ : chr "price"
.. ..$ :List of 2
.. .. ..$ : chr "jam"
.. .. ..$ : chr "price"
$ meat :List of 1
..$ items:List of 2
.. ..$ :List of 2
.. .. ..$ : chr "pork"
.. .. ..$ : chr "price"
.. ..$ :List of 2
.. .. ..$ : chr "beef"
.. .. ..$ : chr "price"
$ other :List of 1
..$ items:List of 2
.. ..$ :List of 2
.. .. ..$ : chr "hammer"
.. .. ..$ : chr "price"
.. ..$ :List of 2
.. .. ..$ : chr "tape"
.. .. ..$ : chr "price"

This seems working. The map_if and function(x) !is.null(names(x)) make sure the change only happen if the name of the item is not NULL. ~modify_depth(.x, 2, function(y) list(y, "price")) creates the list you need.
library(tidyverse)
easybuy2 <- easybuy %>%
map_if(function(x) !is.null(names(x)),
~modify_depth(.x, 2, function(y) list(y, "price")))
Here is how the second item looks like.
easybuy2[[2]][[1]]
# [[1]]
# [[1]][[1]]
# [1] "bananas"
#
# [[1]][[2]]
# [1] "price"
#
#
# [[2]]
# [[2]][[1]]
# [1] "oranges"
#
# [[2]][[2]]
# [1] "price"
Or this also works.
easybuy3 <- easybuy %>%
modify_at(2:5, ~modify_depth(.x, 2, function(y) list(y, "price")))
identical(easybuy2, easybuy3)
# [1] TRUE
Update
easybuy4 <- easybuy %>%
map_if(function(x){
name <- names(x)
if(is.null(name)){
return(FALSE)
} else {
return(name %in% "items")
}
},
~modify_depth(.x, 2, function(y) list(y, "price")))
identical(easybuy2, easybuy4)
# [1] TRUE

Convert Nested List into data.frame with different column length

I was trying to convert below nested list into data.frame but without luck. There are a few complications, mainly the column "results" of position 1 is inconsistent with position 2, as there is no result in position 2.
item length inconsistent across different positions
[[1]]
[[1]]$html_attributions
list()
[[1]]$results
geometry.location.lat geometry.location.lng
1 25.66544 -100.4354
id place_id
1 6ce0a030663144c8e992cbce51eb00479ef7db89 ChIJVy7b7FW9YoYRdaH2I_gOJIk
reference
1 CmRSAAAATdtVfB4Tz1aQ8GhGaw4-nRJ5lZlVNgiOR3ciF4QjmYC56bn6b7omWh1SJEWWqQQEFNXxGZndgEwSgl8sRCOtdF8aXpngUY878Q__yH4in8EMZMCIqSHLARqNgGlV4mKgEhDlvkHLXLiBW4F_KQVT83jIGhS5DJipk6PAnpPDXP2p-4X5NPuG9w
[[1]]$status
[1] "OK"
[[2]]
[[2]]$html_attributions
list()
[[2]]$results
list()
[[2]]$status
[1] "ZERO_RESULTS"
I tried the following codes but they aint' working.
#1
m1 <- do.call(rbind, lapply(myDataFrames, function(y) do.call(rbind, y)))
relist(m1, skeleton = myDataFrames)
#2
relist(matrix(unlist(myDataFrames), ncol = 4, byrow = T), skeleton = myDataFrames)
#3
library(data.table)
df<-rbindlist(myDataFrames, idcol = "index")
df<-rbindlist(myDataFrames, fill=TRUE)
#4
myDataFrame <- do.call(rbind.data.frame, c(myDataFrames, list(stringsAsFactors = FALSE)))

I think I have enough of the original JSON to be able to create a reproducible example:
okjson <- '{"html_attributions":[],"results":[{"geometry":{"location":{"lat":25.66544,"lon":-100.4354},"id":"foo","place_id":"quux"}}],"status":"OK"}'
emptyjson <- '{"html_attributions":[],"results":[],"status":"ZERO_RESULTS"}'
jsons <- list(okjson, emptyjson, okjson)
From here, I'll step (slowly) through the process. I've included much of the intermediate structure for reproducibility, I apologize for the verbosity. This can easily be grouped together and/or put within a magrittr pipeline.
lists <- lapply(jsons, jsonlite::fromJSON)
str(lists)
# List of 3
# $ :List of 3
# ..$ html_attributions: list()
# ..$ results :'data.frame': 1 obs. of 1 variable:
# .. ..$ geometry:'data.frame': 1 obs. of 3 variables:
# .. .. ..$ location:'data.frame': 1 obs. of 2 variables:
# .. .. .. ..$ lat: num 25.7
# .. .. .. ..$ lon: num -100
# .. .. ..$ id : chr "foo"
# .. .. ..$ place_id: chr "quux"
# ..$ status : chr "OK"
# $ :List of 3
# ..$ html_attributions: list()
# ..$ results : list()
# ..$ status : chr "ZERO_RESULTS"
# $ :List of 3
# ..$ html_attributions: list()
# ..$ results :'data.frame': 1 obs. of 1 variable:
# .. ..$ geometry:'data.frame': 1 obs. of 3 variables:
# .. .. ..$ location:'data.frame': 1 obs. of 2 variables:
# .. .. .. ..$ lat: num 25.7
# .. .. .. ..$ lon: num -100
# .. .. ..$ id : chr "foo"
# .. .. ..$ place_id: chr "quux"
# ..$ status : chr "OK"
goodlists <- Filter(function(a) "results" %in% names(a) && length(a$results) > 0, lists)
goodresults <- lapply(goodlists, `[[`, "results")
str(goodresults)
# List of 2
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ geometry:'data.frame': 1 obs. of 3 variables:
# .. ..$ location:'data.frame': 1 obs. of 2 variables:
# .. .. ..$ lat: num 25.7
# .. .. ..$ lon: num -100
# .. ..$ id : chr "foo"
# .. ..$ place_id: chr "quux"
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ geometry:'data.frame': 1 obs. of 3 variables:
# .. ..$ location:'data.frame': 1 obs. of 2 variables:
# .. .. ..$ lat: num 25.7
# .. .. ..$ lon: num -100
# .. ..$ id : chr "foo"
# .. ..$ place_id: chr "quux"
goodresultsdf <- lapply(goodresults, function(a) jsonlite::flatten(as.data.frame(a)))
str(goodresultsdf)
# List of 2
# $ :'data.frame': 1 obs. of 4 variables:
# ..$ geometry.id : chr "foo"
# ..$ geometry.place_id : chr "quux"
# ..$ geometry.location.lat: num 25.7
# ..$ geometry.location.lon: num -100
# $ :'data.frame': 1 obs. of 4 variables:
# ..$ geometry.id : chr "foo"
# ..$ geometry.place_id : chr "quux"
# ..$ geometry.location.lat: num 25.7
# ..$ geometry.location.lon: num -100
We now have a list-of-data.frames, a good place to be.
do.call(rbind.data.frame, c(goodresultsdf, stringsAsFactors = FALSE))
# geometry.id geometry.place_id geometry.location.lat geometry.location.lon
# 1 foo quux 25.66544 -100.4354
# 2 foo quux 25.66544 -100.4354

Removing elements in a nested R list by name

I have a nested element like this
> x <- list(a=list(from="me", id="xyz"), b=list(comment=list(list(message="blabla", id="abc"), list(message="humbug", id="jkl"))), id="123")
> str(x)
List of 3
$ a :List of 2
..$ from: chr "me"
..$ id : chr "xyz"
$ b :List of 1
..$ comment:List of 2
.. ..$ :List of 2
.. .. ..$ message: chr "blabla"
.. .. ..$ id : chr "abc"
.. ..$ :List of 2
.. .. ..$ message: chr "humbug"
.. .. ..$ id : chr "jkl"
$ id: chr "123"
How can I remove all the elements with name id in all levels of the list? i.e. the expected output is
> str(x)
List of 2
$ a:List of 1
..$ from: chr "me"
$ b:List of 1
..$ comment:List of 2
.. ..$ :List of 1
.. .. ..$ message: chr "blabla"
.. ..$ :List of 1
.. .. ..$ message: chr "humbug"
Solutions using rlist package would be particularly welcome, but I'm happy with anything that works.

Recursion is also how I did it:
# recursive function to remove name from all levels of list
stripname <- function(x, name) {
thisdepth <- depth(x)
if (thisdepth == 0) {
return(x)
} else if (length(nameIndex <- which(names(x) == name))) {
x <- x[-nameIndex]
}
return(lapply(x, stripname, name))
}
# function to find depth of a list element
# see http://stackoverflow.com/questions/13432863/determine-level-of-nesting-in-r
depth <- function(this, thisdepth=0){
if (!is.list(this)) {
return(thisdepth)
} else{
return(max(unlist(lapply(this,depth,thisdepth=thisdepth+1))))
}
}
str(stripname(x, "id"))
## List of 2
## $ a:List of 1
## ..$ from: chr "me"
## $ b:List of 1
## ..$ comment:List of 2
## .. ..$ :List of 1
## .. ..$ :List of 1
## .. .. ..$ message: chr "blabla"
## .. .. ..$ message: chr "humbug"

Try a recursive function in the veins of
f <- function(i)
lapply(i, function(x)
if (is.list(x)) {
if(!is.null(names(x))) f(x[names(x)!="id"]) else f(x)
} else x
)
str(f(x[names(x)!="id"]))
# List of 2
# $ a:List of 1
# ..$ from: chr "me"
# $ b:List of 1
# ..$ comment:List of 2
# .. ..$ :List of 1
# .. .. ..$ message: chr "blabla"
# .. ..$ :List of 1
# .. .. ..$ message: chr "humbug"

This is an old question, but this can also be done quite conveniently with rrapply() in the rrapply-package (revisit of base rapply()):
rrapply::rrapply(
x, ## nested list
condition = \(x, .xname) .xname != "id", ## filter condition
how = "prune" ## how to structure result
) |>
str()
#> List of 2
#> $ a:List of 1
#> ..$ from: chr "me"
#> $ b:List of 1
#> ..$ comment:List of 2
#> .. ..$ :List of 1
#> .. .. ..$ message: chr "blabla"
#> .. ..$ :List of 1
#> .. .. ..$ message: chr "humbug"

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Ensure that data frames become tibbles when reading MongoDB data with {mongolite} - r

Related

handling lists in lists to Dataframe in R

Tibble columns of class tibble instead of class data frame

Insert elements into a list based on depth and `if` conditions using modify_depth and modify_if (purrr)

Convert Nested List into data.frame with different column length

Removing elements in a nested R list by name

Categories

Resources