Extracting elements of nested list by contained value using purrr - r

I have a nested list where each nested list has the same elements but not in the same order and the elements are not named explicitly but do have a name value within the list.
As you can see in the structure the list containing the 'Date' field appears at the second position in the first list and the third position in the second list, so I can't extract at a position.
I would like to extract the lists where name : is Date and keep the value associated with it, using the purrr package.
STRUCTURE
dplyr::glimpse(my_list)
List of 2
$ :List of 10
..$ :List of 2
.. ..$ name : chr "MIME-Version"
.. ..$ value: chr "1.0"
..$ :List of 2
.. ..$ name : chr "Date"
.. ..$ value: chr "Wed, 13 Feb 2019 15:20:40 -0800"
..$ :List of 2
.. ..$ name : chr "References"
.. ..$ value: chr "<CAE1g-C7AmC3zoJRG_UgdwwkkSiJMuEuDYLU1j4ni0MZJXNrGNQ#mail.gmail.com>"
..$ :List of 2
.. ..$ name : chr "In-Reply-To"
.. ..$ value: chr "<CAE1g-C7AmC3zoJRG_UgdwwkkSiJMuEuDYLU1j4ni0MZJXNrGNQ#mail.gmail.com>"
..$ :List of 2
.. ..$ name : chr "Message-ID"
.. ..$ value: chr "<CAPApPh+WZszKg_bjQBPrS8TOvLA23hQkaa9Hocb_cgrQYs2R1w#mail.gmail.com>"
..$ :List of 2
.. ..$ name : chr "Subject"
.. ..$ value: chr "Re:"
..$ :List of 2
.. ..$ name : chr "From"
.. ..$ value: chr ""
..$ :List of 2
.. ..$ name : chr "To"
.. ..$ value: chr ""
..$ :List of 2
.. ..$ name : chr "Cc"
.. ..$ value: chr ""
..$ :List of 2
.. ..$ name : chr "Content-Type"
.. ..$ value: chr "multipart/alternative; boundary=\"000000000000f3d8810581cec99d\""
$ :List of 7
..$ :List of 2
.. ..$ name : chr "MIME-Version"
.. ..$ value: chr "1.0"
..$ :List of 2
.. ..$ name : chr "Message-ID"
.. ..$ value: chr "<CAPApPh+THbnCg2e5WmKEwQwHEEjKDHq3V6LkYV9oL88DbHE9Pg#mail.gmail.com>"
..$ :List of 2
.. ..$ name : chr "Date"
.. ..$ value: chr "Wed, 13 Feb 2019 12:18:32 -0800"
..$ :List of 2
.. ..$ name : chr "Subject"
.. ..$ value: chr ""
..$ :List of 2
.. ..$ name : chr "From"
.. ..$ value: chr "Daniel Seneca <senecad#gene.com>"
..$ :List of 2
.. ..$ name : chr "To"
.. ..$ value: chr "Daniel Seneca <seneca.daniel#gene.com>"
..$ :List of 2
.. ..$ name : chr "Content-Type"
.. ..$ value: chr "multipart/mixed; boundary=\"000000000000f11ad10581cc3e85\""
DATA
my_list <-list(list(list(name = "MIME-Version", value = "1.0"), list(name = "Date", value = "Wed, 13 Feb 2019 15:20:40 -0800"), list(name = "References", value = "<CAE1g-C7AmC3zoJRG_UgdwwkkSiJMuEuDYLU1j4ni0MZJXNrGNQ#mail.gmail.com>"), list(name = "In-Reply-To", value = "<CAE1g-C7AmC3zoJRG_UgdwwkkSiJMuEuDYLU1j4ni0MZJXNrGNQ#mail.gmail.com>"),list(name = "Message-ID", value = "<CAPApPh+WZszKg_bjQBPrS8TOvLA23hQkaa9Hocb_cgrQYs2R1w#mail.gmail.com>"), list(name = "Subject", value = "Re:"), list(name = "From", value = ""),list(name = "To", value = ""),list(name = "Cc", value = ""),list(name = "Content-Type", value = "multipart/alternative; boundary=\"000000000000f3d8810581cec99d\"")),
list(list(name = "MIME-Version", value = "1.0"), list(name = "Message-ID",value = "<CAPApPh+THbnCg2e5WmKEwQwHEEjKDHq3V6LkYV9oL88DbHE9Pg#mail.gmail.com>"),list(name = "Date",value = "Wed, 13 Feb 2019 12:18:32 -0800"), list(name = "Subject", value = ""), list(name = "From",value = "Daniel Seneca <senecad#gene.com>"), list(name = "To", value = "Daniel Seneca <seneca.daniel#gene.com>"),list(name = "Content-Type", value = "multipart/mixed; boundary=\"000000000000f11ad10581cc3e85\"")))

The question does not define what the output is supposed to look like so we will assume that it should be a list of lists. Remove the first layer using flatten as shown and then filter its elements using keep.
library(purrr)
my_list %>%
flatten %>%
keep(~ .x$name == "Date")
In base R this could be written:
Filter(function(x) x$name == "Date", do.call("c", my_list))

Related

Plotly in R: How to reference and extract figure values?

I want to know how can I access, extract, and reference values from a plotly figure in R.
Consider, for example, the Sankey diagram from plotly's own site of which there is an abbreviated version here:
library(plotly)
fig <- plot_ly(
type = "sankey",
node = list(
label = c("A1", "A2", "B1", "B2", "C1", "C2"),
color = c("blue", "blue", "blue", "blue", "blue", "blue"),
line = list()
),
link = list(
source = c(0,1,0,2,3,3),
target = c(2,3,3,4,4,5),
value = c(8,4,2,8,4,2)
)
)
fig
If I do View(fig) in Rstudio, a new tab opens titled . (I don't know why this instead of 'fig'). In this tab I can go to x > visdat > 'strig of letters and numbers that is a function?' > attrs > node > x (as shown bellow).
Here all the x coordinates for the Sankey nodes appear.
I want to access these values so I can use them somewhere else. How do I do this? If I click on the right side of the Rsutudio tab to copy the code to console I get:
environment(.[["x"]][["visdat"]][["484c3ec36899"]])[["attrs"]][["node"]][["x"]]
which obviously doesn't work as there is no object named ..
In this case I have tried fig$x$visdat$`484c3ec36899`() but I cant do fig$x$visdat$`484c3ec36899`()$attr, and I don't know what else to do.
So, how can I access any value from a plotly object? Any documentation referencing this topic would also be helpful.
Thanks.
You can find the documentation of the data structure of plotly in R here: https://plotly.com/r/figure-structure/
To check the data structure you can use str(fig):
List of 8
$ x :List of 6
..$ visdat :List of 1
.. ..$ a3b8795a4:function ()
..$ cur_data: chr "a3b8795a4"
..$ attrs :List of 1
.. ..$ a3b8795a4:List of 6
.. .. ..$ node :List of 3
.. .. .. ..$ label: chr [1:6] "A1" "A2" "B1" "B2" ...
.. .. .. ..$ color: chr [1:6] "blue" "blue" "blue" "blue" ...
.. .. .. ..$ line : list()
.. .. ..$ link :List of 3
.. .. .. ..$ source: num [1:6] 0 1 0 2 3 3
.. .. .. ..$ target: num [1:6] 2 3 3 4 4 5
.. .. .. ..$ value : num [1:6] 8 4 2 8 4 2
.. .. ..$ alpha_stroke: num 1
.. .. ..$ sizes : num [1:2] 10 100
.. .. ..$ spans : num [1:2] 1 20
.. .. ..$ type : chr "sankey"
..$ layout :List of 3
.. ..$ width : NULL
.. ..$ height: NULL
.. ..$ margin:List of 4
.. .. ..$ b: num 40
.. .. ..$ l: num 60
.. .. ..$ t: num 25
.. .. ..$ r: num 10
..$ source : chr "A"
..$ config :List of 1
.. ..$ showSendToCloud: logi FALSE
..- attr(*, "TOJSON_FUNC")=function (x, ...)
$ width : NULL
$ height : NULL
$ sizingPolicy :List of 6
..$ defaultWidth : chr "100%"
..$ defaultHeight: num 400
..$ padding : NULL
..$ viewer :List of 6
.. ..$ defaultWidth : NULL
.. ..$ defaultHeight: NULL
.. ..$ padding : NULL
.. ..$ fill : logi TRUE
.. ..$ suppress : logi FALSE
.. ..$ paneHeight : NULL
..$ browser :List of 5
.. ..$ defaultWidth : NULL
.. ..$ defaultHeight: NULL
.. ..$ padding : NULL
.. ..$ fill : logi TRUE
.. ..$ external : logi FALSE
..$ knitr :List of 3
.. ..$ defaultWidth : NULL
.. ..$ defaultHeight: NULL
.. ..$ figure : logi TRUE
$ dependencies :List of 5
..$ :List of 10
.. ..$ name : chr "typedarray"
.. ..$ version : chr "0.1"
.. ..$ src :List of 1
.. .. ..$ file: chr "htmlwidgets/lib/typedarray"
.. ..$ meta : NULL
.. ..$ script : chr "typedarray.min.js"
.. ..$ stylesheet: NULL
.. ..$ head : NULL
.. ..$ attachment: NULL
.. ..$ package : chr "plotly"
.. ..$ all_files : logi FALSE
.. ..- attr(*, "class")= chr "html_dependency"
..$ :List of 10
.. ..$ name : chr "jquery"
.. ..$ version : chr "1.11.3"
.. ..$ src :List of 1
.. .. ..$ file: chr "lib/jquery"
.. ..$ meta : NULL
.. ..$ script : chr "jquery.min.js"
.. ..$ stylesheet: NULL
.. ..$ head : NULL
.. ..$ attachment: NULL
.. ..$ package : chr "crosstalk"
.. ..$ all_files : logi TRUE
.. ..- attr(*, "class")= chr "html_dependency"
..$ :List of 10
.. ..$ name : chr "crosstalk"
.. ..$ version : chr "1.1.0.1"
.. ..$ src :List of 1
.. .. ..$ file: chr "www"
.. ..$ meta : NULL
.. ..$ script : chr "js/crosstalk.min.js"
.. ..$ stylesheet: chr "css/crosstalk.css"
.. ..$ head : NULL
.. ..$ attachment: NULL
.. ..$ package : chr "crosstalk"
.. ..$ all_files : logi TRUE
.. ..- attr(*, "class")= chr "html_dependency"
..$ :List of 10
.. ..$ name : chr "plotly-htmlwidgets-css"
.. ..$ version : chr "1.52.2"
.. ..$ src :List of 1
.. .. ..$ file: chr "htmlwidgets/lib/plotlyjs"
.. ..$ meta : NULL
.. ..$ script : NULL
.. ..$ stylesheet: chr "plotly-htmlwidgets.css"
.. ..$ head : NULL
.. ..$ attachment: NULL
.. ..$ package : chr "plotly"
.. ..$ all_files : logi FALSE
.. ..- attr(*, "class")= chr "html_dependency"
..$ :List of 10
.. ..$ name : chr "plotly-main"
.. ..$ version : chr "1.52.2"
.. ..$ src :List of 1
.. .. ..$ file: chr "htmlwidgets/lib/plotlyjs"
.. ..$ meta : NULL
.. ..$ script : chr "plotly-latest.min.js"
.. ..$ stylesheet: NULL
.. ..$ head : NULL
.. ..$ attachment: NULL
.. ..$ package : chr "plotly"
.. ..$ all_files : logi FALSE
.. ..- attr(*, "class")= chr "html_dependency"
$ elementId : NULL
$ preRenderHook:function (p, registerFrames = TRUE)
$ jsHooks : list()
- attr(*, "class")= chr [1:2] "plotly" "htmlwidget"
- attr(*, "package")= chr "plotly"
You could extract the coordinates with:
unlist(fig$x$attrs)

R - How to build a nested list without duplicated sub-list declaration?

Would there be a standard approach to building nested lists in which the sub-lists contain the same structure, to remove duplication ?
Dummy example structure:
a "top" list (wordsList), containing one sub-list (languageType)
this sub-list contains "sub-sub lists" (french, english, falty), which each contain another list with the same structure:
sentenceType, with each elems a sub-list (leBien, leMal)
How to build such a structure without duplication ? (so to assign to elem of languageType the list sentenceType)
Example expected output:
wordsList <- list(
languageType = list(
french = list(
sentenceType = list(
leBien = c("Bien", "le", "bonjour", "messieurs", "dames"),
leMal = c("Mal", "la", "bonsoir", "mesdames", "sieurs")
)
),
english = list(
sentenceType = list(
leBien = c("Well", "hello", "my", "good", "sire", "or", "madame"),
leMal = c("Bad", "goodbye", "your", "bad", "madame", "and", "sire")
)
),
falty = list(
sentenceType = list(
leBien = c(1:10),
leMal = list("this", "is", 0, 5, "not correct !")
)
)
)
)
str(wordsList)
List of 1
$ languageType:List of 3
..$ french :List of 1
.. ..$ sentenceType:List of 2
.. .. ..$ leBien: chr [1:5] "Bien" "le" "bonjour" "messieurs" ...
.. .. ..$ leMal : chr [1:5] "Mal" "la" "bonsoir" "mesdames" ...
..$ english:List of 1
.. ..$ sentenceType:List of 2
.. .. ..$ leBien: chr [1:7] "Well" "hello" "my" "good" ...
.. .. ..$ leMal : chr [1:7] "Bad" "goodbye" "your" "bad" ...
..$ falty :List of 1
.. ..$ sentenceType:List of 2
.. .. ..$ leBien: int [1:10] 1 2 3 4 5 6 7 8 9 10
.. .. ..$ leMal :List of 5
.. .. .. ..$ : chr "this"
.. .. .. ..$ : chr "is"
.. .. .. ..$ : num 0
.. .. .. ..$ : num 5
.. .. .. ..$ : chr "not correct !"

build json from nested dataframe in r

I'm struggling with building a json.
I have a dataframe with 3 rows, and two columns: "id" (a list of ids), and then "text" (tweets).
df$id= c(78198310004451, 78198310004451, 88198310004453)
df$text = c("I love you", "I just got married!", "I just got a new job!")
and four other fixed variables whose values are static:
Models = c(1:7)
orgId= 1
and two attributes of the twitter id
include_outcome: logi FALSE
twitterId = 70051429
I pulled down the template json and converted it into a dataframe (see below), for an example of 3 tweets.
I cannot figure out how to generate such a dataframe from my pre-existing data frame i mentioned above (which i will then convert to json (toJSON) ).
List of 3
$ Models: num [1:7] 1 2 3 4 5 6 7
$ orgId : num 1
$ userData :List of 1
..$ :List of 3
.. ..$ tweets :List of 3
.. .. ..$ :List of 2
.. .. .. ..$ text: chr "I love you"
.. .. .. ..$ id : num 78198310004451
.. .. ..$ :List of 2
.. .. .. ..$ text: chr "I just got married!"
.. .. .. ..$ id : num 78198310004452
.. .. ..$ :List of 2
.. .. .. ..$ text: chr "I just got a new job!"
.. .. .. ..$ id : num 88198310004453
.. ..$ twitterId : num 70051429
.. ..$ include_outcome: logi FALSE
here's the dput output
list(Models = c(1, 2, 3, 4, 5, 6, 7), orgId = 1, userData = list(
list(tweets = list(list(text = "I love you", id = 78198310004451),
list(text = "I just got married!", id = 78198310004452),
list(text = "I just got a new job!", id = 88198310004453)),
twitterId = 70051429, include_outcome = FALSE)))```

Insert elements into a list based on depth and `if` conditions using modify_depth and modify_if (purrr)

I'm learning some purrr commands, specifically the modify_* family of functions. I'm attemping to add price bins to items found in a grocery store (see below for my attempt and error code).
library(tidyverse)
Data
easybuy <- list(
"5520 N Division St, Spokane, WA 99208, USA",
list("bananas", "oranges"),
canned = list("olives", "fish", "jam"),
list("pork", "beef"),
list("hammer", "tape")
) %>%
map(list) %>%
# name the sublists
set_names(c("address",
"fruit",
"canned",
"meat",
"other")) %>%
# except for address, names the sublists "items"
modify_at(c(2:5), ~ set_names(.x, "items"))
Take a peek:
glimpse(easybuy)
#> List of 5
#> $ address:List of 1
#> ..$ : chr "5520 N Division St, Spokane, WA 99208, USA"
#> $ fruit :List of 1
#> ..$ items:List of 2
#> .. ..$ : chr "bananas"
#> .. ..$ : chr "oranges"
#> $ canned :List of 1
#> ..$ items:List of 3
#> .. ..$ : chr "olives"
#> .. ..$ : chr "fish"
#> .. ..$ : chr "jam"
#> $ meat :List of 1
#> ..$ items:List of 2
#> .. ..$ : chr "pork"
#> .. ..$ : chr "beef"
#> $ other :List of 1
#> ..$ items:List of 2
#> .. ..$ : chr "hammer"
#> .. ..$ : chr "tape"
My Attempt
Idea: go in a depth of two, and look for "items", append a "price". I'm not sure if I can nest the modify functions like this.
easybuy %>%
modify_depth(2, ~ modify_at(., "items", ~ append("price")))
#> Error: character indexing requires a named object
Desired
I would like the following structure (note the addition of "price" under each item):
List of 5
$ address:List of 1
..$ : chr "5520 N Division St, Spokane, WA 99208, USA"
$ fruit :List of 1
..$ items:List of 2
.. ..$ :List of 2
.. .. ..$ : chr "bananas"
.. .. ..$ : chr "price"
.. ..$ :List of 2
.. .. ..$ : chr "oranges"
.. .. ..$ : chr "price"
$ canned :List of 1
..$ items:List of 3
.. ..$ :List of 2
.. .. ..$ : chr "olives"
.. .. ..$ : chr "price"
.. ..$ :List of 2
.. .. ..$ : chr "fish"
.. .. ..$ : chr "price"
.. ..$ :List of 2
.. .. ..$ : chr "jam"
.. .. ..$ : chr "price"
$ meat :List of 1
..$ items:List of 2
.. ..$ :List of 2
.. .. ..$ : chr "pork"
.. .. ..$ : chr "price"
.. ..$ :List of 2
.. .. ..$ : chr "beef"
.. .. ..$ : chr "price"
$ other :List of 1
..$ items:List of 2
.. ..$ :List of 2
.. .. ..$ : chr "hammer"
.. .. ..$ : chr "price"
.. ..$ :List of 2
.. .. ..$ : chr "tape"
.. .. ..$ : chr "price"
This seems working. The map_if and function(x) !is.null(names(x)) make sure the change only happen if the name of the item is not NULL. ~modify_depth(.x, 2, function(y) list(y, "price")) creates the list you need.
library(tidyverse)
easybuy2 <- easybuy %>%
map_if(function(x) !is.null(names(x)),
~modify_depth(.x, 2, function(y) list(y, "price")))
Here is how the second item looks like.
easybuy2[[2]][[1]]
# [[1]]
# [[1]][[1]]
# [1] "bananas"
#
# [[1]][[2]]
# [1] "price"
#
#
# [[2]]
# [[2]][[1]]
# [1] "oranges"
#
# [[2]][[2]]
# [1] "price"
Or this also works.
easybuy3 <- easybuy %>%
modify_at(2:5, ~modify_depth(.x, 2, function(y) list(y, "price")))
identical(easybuy2, easybuy3)
# [1] TRUE
Update
easybuy4 <- easybuy %>%
map_if(function(x){
name <- names(x)
if(is.null(name)){
return(FALSE)
} else {
return(name %in% "items")
}
},
~modify_depth(.x, 2, function(y) list(y, "price")))
identical(easybuy2, easybuy4)
# [1] TRUE

Extract elements from different levels of a nested list

I have a nested list of academic authors such as:
> str(content)
List of 3
$ author-retrieval-response:List of 1
..$ :List of 6
.. ..$ #status : chr "found"
.. ..$ #_fa : chr "true"
.. ..$ coredata :List of 3
.. .. ..$ dc:identifier : chr "AUTHOR_ID:55604964500"
.. .. ..$ document-count: chr "6"
.. .. ..$ cited-by-count: chr "13"
.. ..$ h-index : chr "3"
.. ..$ coauthor-count: chr "7"
.. ..$ preferred-name:List of 2
.. .. ..$ surname : chr "García Cruz"
.. .. ..$ given-name: chr "Gustavo Adolfo"
$ author-retrieval-response:List of 1
..$ :List of 6
.. ..$ #status : chr "found"
.. ..$ #_fa : chr "true"
.. ..$ coredata :List of 3
.. .. ..$ dc:identifier : chr "AUTHOR_ID:56595713900"
.. .. ..$ document-count: chr "4"
.. .. ..$ cited-by-count: chr "21"
.. ..$ h-index : chr "3"
.. ..$ coauthor-count: chr "5"
.. ..$ preferred-name:List of 2
.. .. ..$ surname : chr "Akimov"
.. .. ..$ given-name: chr "Alexey"
$ author-retrieval-response:List of 1
..$ :List of 6
.. ..$ #status : chr "found"
.. ..$ #_fa : chr "true"
.. ..$ coredata :List of 3
.. .. ..$ dc:identifier : chr "AUTHOR_ID:12792624600"
.. .. ..$ document-count: chr "10"
.. .. ..$ cited-by-count: chr "117"
.. ..$ h-index : chr "6"
.. ..$ coauthor-count: chr "7"
.. ..$ preferred-name:List of 2
.. .. ..$ surname : chr "Alecke"
.. .. ..$ given-name: chr "Björn"
I am interested in extracting the following values:
dc:identifier, document-count, cited-by-count, h-index,
coauthor-count, surname, given-name
And parsing them in a data-frame like structure.
I have two issues: the first one is that I don't get to access to the different levels of my list. Indeed, while content[[3]] return the elements of the third sub-list/author, I have not found a way to access the sublists of the third author, that is:
> content[[3]][[2]]
Error in content[[3]][[2]] : subscript out of bounds
I also imagine that once I can access to it, I can not simply use sapply as the elements I'd like to parse from my list are not at the same levels.
I paste the dput of the first three elements of my list:
structure(list(`author-retrieval-response` = list(structure(list(
`#status` = "found", `#_fa` = "true", coredata = structure(list(
`dc:identifier` = "AUTHOR_ID:55604964500", `document-count` = "6",
`cited-by-count` = "13"), .Names = c("dc:identifier",
"document-count", "cited-by-count")), `h-index` = "3", `coauthor-count` = "7",
`preferred-name` = structure(list(surname = "García Cruz",
`given-name` = "Gustavo Adolfo"), .Names = c("surname",
"given-name"))), .Names = c("#status", "#_fa", "coredata",
"h-index", "coauthor-count", "preferred-name"))), `author-retrieval-response` = list(
structure(list(`#status` = "found", `#_fa` = "true", coredata = structure(list(
`dc:identifier` = "AUTHOR_ID:56595713900", `document-count` = "4",
`cited-by-count` = "21"), .Names = c("dc:identifier",
"document-count", "cited-by-count")), `h-index` = "3", `coauthor-count` = "5",
`preferred-name` = structure(list(surname = "Akimov",
`given-name` = "Alexey"), .Names = c("surname", "given-name"
))), .Names = c("#status", "#_fa", "coredata", "h-index",
"coauthor-count", "preferred-name"))), `author-retrieval-response` = list(
structure(list(`#status` = "found", `#_fa` = "true", coredata = structure(list(
`dc:identifier` = "AUTHOR_ID:12792624600", `document-count` = "10",
`cited-by-count` = "117"), .Names = c("dc:identifier",
"document-count", "cited-by-count")), `h-index` = "6", `coauthor-count` = "7",
`preferred-name` = structure(list(surname = "Alecke",
`given-name` = "Björn"), .Names = c("surname", "given-name"
))), .Names = c("#status", "#_fa", "coredata", "h-index",
"coauthor-count", "preferred-name")))), .Names = c("author-retrieval-response",
"author-retrieval-response", "author-retrieval-response"))
Many thanks for your help!
Consider an rapply (recursive apply function) to flatten all nested child and grandchild elements inside an lapply that runs across the top three parent elements. Then transpose the result with t() and pass it into a data.frame() constructor call.
flat_list <- lapply(my_list, function(x) data.frame(t(rapply(x, function(x) x[1]))))
final_df <- do.call(rbind, unname(flat_list))
Output
final_df
# X.status X._fa coredata.dc.identifier coredata.document.count coredata.cited.by.count h.index coauthor.count preferred.name.surname preferred.name.given.name
# 1 found true AUTHOR_ID:55604964500 6 13 3 7 García Cruz Gustavo Adolfo
# 2 found true AUTHOR_ID:56595713900 4 21 3 5 Akimov Alexey
# 3 found true AUTHOR_ID:12792624600 10 117 6 7 Alecke Björn

Resources