I have a nested list of academic authors such as:
> str(content)
List of 3
$ author-retrieval-response:List of 1
..$ :List of 6
.. ..$ #status : chr "found"
.. ..$ #_fa : chr "true"
.. ..$ coredata :List of 3
.. .. ..$ dc:identifier : chr "AUTHOR_ID:55604964500"
.. .. ..$ document-count: chr "6"
.. .. ..$ cited-by-count: chr "13"
.. ..$ h-index : chr "3"
.. ..$ coauthor-count: chr "7"
.. ..$ preferred-name:List of 2
.. .. ..$ surname : chr "García Cruz"
.. .. ..$ given-name: chr "Gustavo Adolfo"
$ author-retrieval-response:List of 1
..$ :List of 6
.. ..$ #status : chr "found"
.. ..$ #_fa : chr "true"
.. ..$ coredata :List of 3
.. .. ..$ dc:identifier : chr "AUTHOR_ID:56595713900"
.. .. ..$ document-count: chr "4"
.. .. ..$ cited-by-count: chr "21"
.. ..$ h-index : chr "3"
.. ..$ coauthor-count: chr "5"
.. ..$ preferred-name:List of 2
.. .. ..$ surname : chr "Akimov"
.. .. ..$ given-name: chr "Alexey"
$ author-retrieval-response:List of 1
..$ :List of 6
.. ..$ #status : chr "found"
.. ..$ #_fa : chr "true"
.. ..$ coredata :List of 3
.. .. ..$ dc:identifier : chr "AUTHOR_ID:12792624600"
.. .. ..$ document-count: chr "10"
.. .. ..$ cited-by-count: chr "117"
.. ..$ h-index : chr "6"
.. ..$ coauthor-count: chr "7"
.. ..$ preferred-name:List of 2
.. .. ..$ surname : chr "Alecke"
.. .. ..$ given-name: chr "Björn"
I am interested in extracting the following values:
dc:identifier, document-count, cited-by-count, h-index,
coauthor-count, surname, given-name
And parsing them in a data-frame like structure.
I have two issues: the first one is that I don't get to access to the different levels of my list. Indeed, while content[[3]] return the elements of the third sub-list/author, I have not found a way to access the sublists of the third author, that is:
> content[[3]][[2]]
Error in content[[3]][[2]] : subscript out of bounds
I also imagine that once I can access to it, I can not simply use sapply as the elements I'd like to parse from my list are not at the same levels.
I paste the dput of the first three elements of my list:
structure(list(`author-retrieval-response` = list(structure(list(
`#status` = "found", `#_fa` = "true", coredata = structure(list(
`dc:identifier` = "AUTHOR_ID:55604964500", `document-count` = "6",
`cited-by-count` = "13"), .Names = c("dc:identifier",
"document-count", "cited-by-count")), `h-index` = "3", `coauthor-count` = "7",
`preferred-name` = structure(list(surname = "García Cruz",
`given-name` = "Gustavo Adolfo"), .Names = c("surname",
"given-name"))), .Names = c("#status", "#_fa", "coredata",
"h-index", "coauthor-count", "preferred-name"))), `author-retrieval-response` = list(
structure(list(`#status` = "found", `#_fa` = "true", coredata = structure(list(
`dc:identifier` = "AUTHOR_ID:56595713900", `document-count` = "4",
`cited-by-count` = "21"), .Names = c("dc:identifier",
"document-count", "cited-by-count")), `h-index` = "3", `coauthor-count` = "5",
`preferred-name` = structure(list(surname = "Akimov",
`given-name` = "Alexey"), .Names = c("surname", "given-name"
))), .Names = c("#status", "#_fa", "coredata", "h-index",
"coauthor-count", "preferred-name"))), `author-retrieval-response` = list(
structure(list(`#status` = "found", `#_fa` = "true", coredata = structure(list(
`dc:identifier` = "AUTHOR_ID:12792624600", `document-count` = "10",
`cited-by-count` = "117"), .Names = c("dc:identifier",
"document-count", "cited-by-count")), `h-index` = "6", `coauthor-count` = "7",
`preferred-name` = structure(list(surname = "Alecke",
`given-name` = "Björn"), .Names = c("surname", "given-name"
))), .Names = c("#status", "#_fa", "coredata", "h-index",
"coauthor-count", "preferred-name")))), .Names = c("author-retrieval-response",
"author-retrieval-response", "author-retrieval-response"))
Many thanks for your help!
Consider an rapply (recursive apply function) to flatten all nested child and grandchild elements inside an lapply that runs across the top three parent elements. Then transpose the result with t() and pass it into a data.frame() constructor call.
flat_list <- lapply(my_list, function(x) data.frame(t(rapply(x, function(x) x[1]))))
final_df <- do.call(rbind, unname(flat_list))
Output
final_df
# X.status X._fa coredata.dc.identifier coredata.document.count coredata.cited.by.count h.index coauthor.count preferred.name.surname preferred.name.given.name
# 1 found true AUTHOR_ID:55604964500 6 13 3 7 García Cruz Gustavo Adolfo
# 2 found true AUTHOR_ID:56595713900 4 21 3 5 Akimov Alexey
# 3 found true AUTHOR_ID:12792624600 10 117 6 7 Alecke Björn
Related
Would there be a standard approach to building nested lists in which the sub-lists contain the same structure, to remove duplication ?
Dummy example structure:
a "top" list (wordsList), containing one sub-list (languageType)
this sub-list contains "sub-sub lists" (french, english, falty), which each contain another list with the same structure:
sentenceType, with each elems a sub-list (leBien, leMal)
How to build such a structure without duplication ? (so to assign to elem of languageType the list sentenceType)
Example expected output:
wordsList <- list(
languageType = list(
french = list(
sentenceType = list(
leBien = c("Bien", "le", "bonjour", "messieurs", "dames"),
leMal = c("Mal", "la", "bonsoir", "mesdames", "sieurs")
)
),
english = list(
sentenceType = list(
leBien = c("Well", "hello", "my", "good", "sire", "or", "madame"),
leMal = c("Bad", "goodbye", "your", "bad", "madame", "and", "sire")
)
),
falty = list(
sentenceType = list(
leBien = c(1:10),
leMal = list("this", "is", 0, 5, "not correct !")
)
)
)
)
str(wordsList)
List of 1
$ languageType:List of 3
..$ french :List of 1
.. ..$ sentenceType:List of 2
.. .. ..$ leBien: chr [1:5] "Bien" "le" "bonjour" "messieurs" ...
.. .. ..$ leMal : chr [1:5] "Mal" "la" "bonsoir" "mesdames" ...
..$ english:List of 1
.. ..$ sentenceType:List of 2
.. .. ..$ leBien: chr [1:7] "Well" "hello" "my" "good" ...
.. .. ..$ leMal : chr [1:7] "Bad" "goodbye" "your" "bad" ...
..$ falty :List of 1
.. ..$ sentenceType:List of 2
.. .. ..$ leBien: int [1:10] 1 2 3 4 5 6 7 8 9 10
.. .. ..$ leMal :List of 5
.. .. .. ..$ : chr "this"
.. .. .. ..$ : chr "is"
.. .. .. ..$ : num 0
.. .. .. ..$ : num 5
.. .. .. ..$ : chr "not correct !"
I'm struggling with building a json.
I have a dataframe with 3 rows, and two columns: "id" (a list of ids), and then "text" (tweets).
df$id= c(78198310004451, 78198310004451, 88198310004453)
df$text = c("I love you", "I just got married!", "I just got a new job!")
and four other fixed variables whose values are static:
Models = c(1:7)
orgId= 1
and two attributes of the twitter id
include_outcome: logi FALSE
twitterId = 70051429
I pulled down the template json and converted it into a dataframe (see below), for an example of 3 tweets.
I cannot figure out how to generate such a dataframe from my pre-existing data frame i mentioned above (which i will then convert to json (toJSON) ).
List of 3
$ Models: num [1:7] 1 2 3 4 5 6 7
$ orgId : num 1
$ userData :List of 1
..$ :List of 3
.. ..$ tweets :List of 3
.. .. ..$ :List of 2
.. .. .. ..$ text: chr "I love you"
.. .. .. ..$ id : num 78198310004451
.. .. ..$ :List of 2
.. .. .. ..$ text: chr "I just got married!"
.. .. .. ..$ id : num 78198310004452
.. .. ..$ :List of 2
.. .. .. ..$ text: chr "I just got a new job!"
.. .. .. ..$ id : num 88198310004453
.. ..$ twitterId : num 70051429
.. ..$ include_outcome: logi FALSE
here's the dput output
list(Models = c(1, 2, 3, 4, 5, 6, 7), orgId = 1, userData = list(
list(tweets = list(list(text = "I love you", id = 78198310004451),
list(text = "I just got married!", id = 78198310004452),
list(text = "I just got a new job!", id = 88198310004453)),
twitterId = 70051429, include_outcome = FALSE)))```
I'm learning some purrr commands, specifically the modify_* family of functions. I'm attemping to add price bins to items found in a grocery store (see below for my attempt and error code).
library(tidyverse)
Data
easybuy <- list(
"5520 N Division St, Spokane, WA 99208, USA",
list("bananas", "oranges"),
canned = list("olives", "fish", "jam"),
list("pork", "beef"),
list("hammer", "tape")
) %>%
map(list) %>%
# name the sublists
set_names(c("address",
"fruit",
"canned",
"meat",
"other")) %>%
# except for address, names the sublists "items"
modify_at(c(2:5), ~ set_names(.x, "items"))
Take a peek:
glimpse(easybuy)
#> List of 5
#> $ address:List of 1
#> ..$ : chr "5520 N Division St, Spokane, WA 99208, USA"
#> $ fruit :List of 1
#> ..$ items:List of 2
#> .. ..$ : chr "bananas"
#> .. ..$ : chr "oranges"
#> $ canned :List of 1
#> ..$ items:List of 3
#> .. ..$ : chr "olives"
#> .. ..$ : chr "fish"
#> .. ..$ : chr "jam"
#> $ meat :List of 1
#> ..$ items:List of 2
#> .. ..$ : chr "pork"
#> .. ..$ : chr "beef"
#> $ other :List of 1
#> ..$ items:List of 2
#> .. ..$ : chr "hammer"
#> .. ..$ : chr "tape"
My Attempt
Idea: go in a depth of two, and look for "items", append a "price". I'm not sure if I can nest the modify functions like this.
easybuy %>%
modify_depth(2, ~ modify_at(., "items", ~ append("price")))
#> Error: character indexing requires a named object
Desired
I would like the following structure (note the addition of "price" under each item):
List of 5
$ address:List of 1
..$ : chr "5520 N Division St, Spokane, WA 99208, USA"
$ fruit :List of 1
..$ items:List of 2
.. ..$ :List of 2
.. .. ..$ : chr "bananas"
.. .. ..$ : chr "price"
.. ..$ :List of 2
.. .. ..$ : chr "oranges"
.. .. ..$ : chr "price"
$ canned :List of 1
..$ items:List of 3
.. ..$ :List of 2
.. .. ..$ : chr "olives"
.. .. ..$ : chr "price"
.. ..$ :List of 2
.. .. ..$ : chr "fish"
.. .. ..$ : chr "price"
.. ..$ :List of 2
.. .. ..$ : chr "jam"
.. .. ..$ : chr "price"
$ meat :List of 1
..$ items:List of 2
.. ..$ :List of 2
.. .. ..$ : chr "pork"
.. .. ..$ : chr "price"
.. ..$ :List of 2
.. .. ..$ : chr "beef"
.. .. ..$ : chr "price"
$ other :List of 1
..$ items:List of 2
.. ..$ :List of 2
.. .. ..$ : chr "hammer"
.. .. ..$ : chr "price"
.. ..$ :List of 2
.. .. ..$ : chr "tape"
.. .. ..$ : chr "price"
This seems working. The map_if and function(x) !is.null(names(x)) make sure the change only happen if the name of the item is not NULL. ~modify_depth(.x, 2, function(y) list(y, "price")) creates the list you need.
library(tidyverse)
easybuy2 <- easybuy %>%
map_if(function(x) !is.null(names(x)),
~modify_depth(.x, 2, function(y) list(y, "price")))
Here is how the second item looks like.
easybuy2[[2]][[1]]
# [[1]]
# [[1]][[1]]
# [1] "bananas"
#
# [[1]][[2]]
# [1] "price"
#
#
# [[2]]
# [[2]][[1]]
# [1] "oranges"
#
# [[2]][[2]]
# [1] "price"
Or this also works.
easybuy3 <- easybuy %>%
modify_at(2:5, ~modify_depth(.x, 2, function(y) list(y, "price")))
identical(easybuy2, easybuy3)
# [1] TRUE
Update
easybuy4 <- easybuy %>%
map_if(function(x){
name <- names(x)
if(is.null(name)){
return(FALSE)
} else {
return(name %in% "items")
}
},
~modify_depth(.x, 2, function(y) list(y, "price")))
identical(easybuy2, easybuy4)
# [1] TRUE
I have a nested list where each nested list has the same elements but not in the same order and the elements are not named explicitly but do have a name value within the list.
As you can see in the structure the list containing the 'Date' field appears at the second position in the first list and the third position in the second list, so I can't extract at a position.
I would like to extract the lists where name : is Date and keep the value associated with it, using the purrr package.
STRUCTURE
dplyr::glimpse(my_list)
List of 2
$ :List of 10
..$ :List of 2
.. ..$ name : chr "MIME-Version"
.. ..$ value: chr "1.0"
..$ :List of 2
.. ..$ name : chr "Date"
.. ..$ value: chr "Wed, 13 Feb 2019 15:20:40 -0800"
..$ :List of 2
.. ..$ name : chr "References"
.. ..$ value: chr "<CAE1g-C7AmC3zoJRG_UgdwwkkSiJMuEuDYLU1j4ni0MZJXNrGNQ#mail.gmail.com>"
..$ :List of 2
.. ..$ name : chr "In-Reply-To"
.. ..$ value: chr "<CAE1g-C7AmC3zoJRG_UgdwwkkSiJMuEuDYLU1j4ni0MZJXNrGNQ#mail.gmail.com>"
..$ :List of 2
.. ..$ name : chr "Message-ID"
.. ..$ value: chr "<CAPApPh+WZszKg_bjQBPrS8TOvLA23hQkaa9Hocb_cgrQYs2R1w#mail.gmail.com>"
..$ :List of 2
.. ..$ name : chr "Subject"
.. ..$ value: chr "Re:"
..$ :List of 2
.. ..$ name : chr "From"
.. ..$ value: chr ""
..$ :List of 2
.. ..$ name : chr "To"
.. ..$ value: chr ""
..$ :List of 2
.. ..$ name : chr "Cc"
.. ..$ value: chr ""
..$ :List of 2
.. ..$ name : chr "Content-Type"
.. ..$ value: chr "multipart/alternative; boundary=\"000000000000f3d8810581cec99d\""
$ :List of 7
..$ :List of 2
.. ..$ name : chr "MIME-Version"
.. ..$ value: chr "1.0"
..$ :List of 2
.. ..$ name : chr "Message-ID"
.. ..$ value: chr "<CAPApPh+THbnCg2e5WmKEwQwHEEjKDHq3V6LkYV9oL88DbHE9Pg#mail.gmail.com>"
..$ :List of 2
.. ..$ name : chr "Date"
.. ..$ value: chr "Wed, 13 Feb 2019 12:18:32 -0800"
..$ :List of 2
.. ..$ name : chr "Subject"
.. ..$ value: chr ""
..$ :List of 2
.. ..$ name : chr "From"
.. ..$ value: chr "Daniel Seneca <senecad#gene.com>"
..$ :List of 2
.. ..$ name : chr "To"
.. ..$ value: chr "Daniel Seneca <seneca.daniel#gene.com>"
..$ :List of 2
.. ..$ name : chr "Content-Type"
.. ..$ value: chr "multipart/mixed; boundary=\"000000000000f11ad10581cc3e85\""
DATA
my_list <-list(list(list(name = "MIME-Version", value = "1.0"), list(name = "Date", value = "Wed, 13 Feb 2019 15:20:40 -0800"), list(name = "References", value = "<CAE1g-C7AmC3zoJRG_UgdwwkkSiJMuEuDYLU1j4ni0MZJXNrGNQ#mail.gmail.com>"), list(name = "In-Reply-To", value = "<CAE1g-C7AmC3zoJRG_UgdwwkkSiJMuEuDYLU1j4ni0MZJXNrGNQ#mail.gmail.com>"),list(name = "Message-ID", value = "<CAPApPh+WZszKg_bjQBPrS8TOvLA23hQkaa9Hocb_cgrQYs2R1w#mail.gmail.com>"), list(name = "Subject", value = "Re:"), list(name = "From", value = ""),list(name = "To", value = ""),list(name = "Cc", value = ""),list(name = "Content-Type", value = "multipart/alternative; boundary=\"000000000000f3d8810581cec99d\"")),
list(list(name = "MIME-Version", value = "1.0"), list(name = "Message-ID",value = "<CAPApPh+THbnCg2e5WmKEwQwHEEjKDHq3V6LkYV9oL88DbHE9Pg#mail.gmail.com>"),list(name = "Date",value = "Wed, 13 Feb 2019 12:18:32 -0800"), list(name = "Subject", value = ""), list(name = "From",value = "Daniel Seneca <senecad#gene.com>"), list(name = "To", value = "Daniel Seneca <seneca.daniel#gene.com>"),list(name = "Content-Type", value = "multipart/mixed; boundary=\"000000000000f11ad10581cc3e85\"")))
The question does not define what the output is supposed to look like so we will assume that it should be a list of lists. Remove the first layer using flatten as shown and then filter its elements using keep.
library(purrr)
my_list %>%
flatten %>%
keep(~ .x$name == "Date")
In base R this could be written:
Filter(function(x) x$name == "Date", do.call("c", my_list))
I got this nested list :
dico <- list(list(list(c("dim.", "dimension", "dimensions", "mesures"
), c("45 cm", "45", "45 CM", "0.45m")), list(c("tamano", "volumen",
"dimension", "talla"), c("45 cm", "45", "0.45 M", "45 centimiento"
)), list(c("measures", "dimension", "measurement"), c("45 cm",
"0.45 m", "100 inches", "100 pouces"))), list(list(c("poids",
"poid", "poids net"), c("100 grammes", "100 gr", "100")), list(
c("peso", "carga", "peso especifico"), c("100 gramos", "100g",
"100", "100 g")), list(c("weight", "net wieght", "weight (grammes)"
), c("100 grams", "100", "100 g"))), list(list(c("Batterie oui/non",
"batterie", "présence batterie"), c("Oui", "batterie", "OUI"
)), list(c("bateria", "bateria si or no", "bateria disponible"
), c("si", "bateria furnindo", "1")), list(c("Battery available",
"battery", "battery yes or no"), c("yes", "Y", "Battery given"
))))
[[1]]
[[1]][[1]]
[[1]][[1]][[1]]
[1] "dim." "dimension" "dimensions" "mesures"
[[1]][[1]][[2]]
[1] "45 cm" "45" "45 CM" "0.45m"
What I want is to create a list with the same structure but instead of having the original values, I want to have a sort of "index" name like :
[[1]]
[[1]][[1]]
[[1]][[1]][[1]]
[1] "1|1|1|1" "1|1|1|2" "1|1|1|3" "1|1|1|4"
[[1]][[1]][[2]]
[1] "1|1|2|1" "1|1|2|2" "1|1|2|3" "1|1|2|4"
and so forth ...
Of course the number of elements is not constant through the different nested indexes. Anyone know how to do that? I heard about rapply but I could not make it.
Try this recursive function with a 2-line body. It does not assume a fixed depth and allows unbalanced lists. No packages are used.
It accepts an object L and a level. If the object is not a list then we have a leaf and it returns its levels. If the object is a list then we have a node and it iterates over its components invoking indexer on each passing the concatenation of lev, i and | for the ith component's level.
indexer <- function(L, lev = character(0)) {
if (!is.list(L)) paste0(lev, seq_along(L))
else Map(indexer, L, paste0(lev, seq_along(L), "|"))
}
Example 1 Using dico from the question
> str( indexer(dico) )
List of 3
$ :List of 3
..$ :List of 2
.. ..$ : chr [1:4] "1|1|1|1" "1|1|1|2" "1|1|1|3" "1|1|1|4"
.. ..$ : chr [1:4] "1|1|2|1" "1|1|2|2" "1|1|2|3" "1|1|2|4"
..$ :List of 2
.. ..$ : chr [1:4] "1|2|1|1" "1|2|1|2" "1|2|1|3" "1|2|1|4"
.. ..$ : chr [1:4] "1|2|2|1" "1|2|2|2" "1|2|2|3" "1|2|2|4"
..$ :List of 2
.. ..$ : chr [1:3] "1|3|1|1" "1|3|1|2" "1|3|1|3"
.. ..$ : chr [1:4] "1|3|2|1" "1|3|2|2" "1|3|2|3" "1|3|2|4"
$ :List of 3
..$ :List of 2
.. ..$ : chr [1:3] "2|1|1|1" "2|1|1|2" "2|1|1|3"
.. ..$ : chr [1:3] "2|1|2|1" "2|1|2|2" "2|1|2|3"
..$ :List of 2
.. ..$ : chr [1:3] "2|2|1|1" "2|2|1|2" "2|2|1|3"
.. ..$ : chr [1:4] "2|2|2|1" "2|2|2|2" "2|2|2|3" "2|2|2|4"
..$ :List of 2
.. ..$ : chr [1:3] "2|3|1|1" "2|3|1|2" "2|3|1|3"
.. ..$ : chr [1:3] "2|3|2|1" "2|3|2|2" "2|3|2|3"
$ :List of 3
..$ :List of 2
.. ..$ : chr [1:3] "3|1|1|1" "3|1|1|2" "3|1|1|3"
.. ..$ : chr [1:3] "3|1|2|1" "3|1|2|2" "3|1|2|3"
..$ :List of 2
.. ..$ : chr [1:3] "3|2|1|1" "3|2|1|2" "3|2|1|3"
.. ..$ : chr [1:3] "3|2|2|1" "3|2|2|2" "3|2|2|3"
..$ :List of 2
.. ..$ : chr [1:3] "3|3|1|1" "3|3|1|2" "3|3|1|3"
.. ..$ : chr [1:3] "3|3|2|1" "3|3|2|2" "3|3|2|3"
Example 2 Here is an example of a list with a different depth and lack of balance:
L <- list(list(1:3, 5:7), 9:10)
giving:
> str( indexer(L) )
List of 2
$ :List of 2
..$ : chr [1:3] "1|1|1" "1|1|2" "1|1|3"
..$ : chr [1:3] "1|2|1" "1|2|2" "1|2|3"
$ : chr [1:2] "2|1" "2|2"
We can use melt (from reshape2) to convert the nested list to a data.frame with the index columns ('L1', 'L2', 'L3') and the 'value' column, convert it to data.table (setDT(...)), grouped by 'L1', 'L2', 'L3', we get the sequence of rows (1:.N), paste the elements of the rows with do.call to a single vector, then relist it to a list with the same structure as that of 'dico' by specifying the skeleton.
library(data.table)
library(reshape2)
dico2 <- relist(do.call(paste, c(setDT(melt(dico))[, 1:.N ,
by = .(L1, L2, L3)], sep="|")), skeleton = dico)
dico2
#[[1]]
#[[1]][[1]]
#[[1]][[1]][[1]]
#[1] "1|1|1|1" "1|1|1|2" "1|1|1|3" "1|1|1|4"
#[[1]][[1]][[2]]
#[1] "1|1|2|1" "1|1|2|2" "1|1|2|3" "1|1|2|4"
#...
#[[3]][[3]]
#[[3]][[3]][[1]]
#[1] "3|3|1|1" "3|3|1|2" "3|3|1|3"
#[[3]][[3]][[2]]
#[1] "3|3|2|1" "3|3|2|2" "3|3|2|3"