Removing elements in a nested R list by name - r

I have a nested element like this
> x <- list(a=list(from="me", id="xyz"), b=list(comment=list(list(message="blabla", id="abc"), list(message="humbug", id="jkl"))), id="123")
> str(x)
List of 3
$ a :List of 2
..$ from: chr "me"
..$ id : chr "xyz"
$ b :List of 1
..$ comment:List of 2
.. ..$ :List of 2
.. .. ..$ message: chr "blabla"
.. .. ..$ id : chr "abc"
.. ..$ :List of 2
.. .. ..$ message: chr "humbug"
.. .. ..$ id : chr "jkl"
$ id: chr "123"
How can I remove all the elements with name id in all levels of the list? i.e. the expected output is
> str(x)
List of 2
$ a:List of 1
..$ from: chr "me"
$ b:List of 1
..$ comment:List of 2
.. ..$ :List of 1
.. .. ..$ message: chr "blabla"
.. ..$ :List of 1
.. .. ..$ message: chr "humbug"
Solutions using rlist package would be particularly welcome, but I'm happy with anything that works.

Recursion is also how I did it:
# recursive function to remove name from all levels of list
stripname <- function(x, name) {
thisdepth <- depth(x)
if (thisdepth == 0) {
return(x)
} else if (length(nameIndex <- which(names(x) == name))) {
x <- x[-nameIndex]
}
return(lapply(x, stripname, name))
}
# function to find depth of a list element
# see http://stackoverflow.com/questions/13432863/determine-level-of-nesting-in-r
depth <- function(this, thisdepth=0){
if (!is.list(this)) {
return(thisdepth)
} else{
return(max(unlist(lapply(this,depth,thisdepth=thisdepth+1))))
}
}
str(stripname(x, "id"))
## List of 2
## $ a:List of 1
## ..$ from: chr "me"
## $ b:List of 1
## ..$ comment:List of 2
## .. ..$ :List of 1
## .. ..$ :List of 1
## .. .. ..$ message: chr "blabla"
## .. .. ..$ message: chr "humbug"

Try a recursive function in the veins of
f <- function(i)
lapply(i, function(x)
if (is.list(x)) {
if(!is.null(names(x))) f(x[names(x)!="id"]) else f(x)
} else x
)
str(f(x[names(x)!="id"]))
# List of 2
# $ a:List of 1
# ..$ from: chr "me"
# $ b:List of 1
# ..$ comment:List of 2
# .. ..$ :List of 1
# .. .. ..$ message: chr "blabla"
# .. ..$ :List of 1
# .. .. ..$ message: chr "humbug"

This is an old question, but this can also be done quite conveniently with rrapply() in the rrapply-package (revisit of base rapply()):
rrapply::rrapply(
x, ## nested list
condition = \(x, .xname) .xname != "id", ## filter condition
how = "prune" ## how to structure result
) |>
str()
#> List of 2
#> $ a:List of 1
#> ..$ from: chr "me"
#> $ b:List of 1
#> ..$ comment:List of 2
#> .. ..$ :List of 1
#> .. .. ..$ message: chr "blabla"
#> .. ..$ :List of 1
#> .. .. ..$ message: chr "humbug"

Related

Get data to be usable

I have been trying to get the data from this link to be usable
url <- "https://www.sec.gov/Archives/edgar/data/1061165/0001567619-21-010580.txt"
that should be the same information as the one on this link
https://www.sec.gov/Archives/edgar/data/1061165/000156761921010580/xslForm13F_X01/form13fInfoTable.xml
I have been able to download the file into a .txt, but can not get the data
Thanks
The file appears to be two nested XML files. We can extract each of the components into lists with this code:
txt <- readLines("https://www.sec.gov/Archives/edgar/data/1061165/0001567619-21-010580.txt")
grep("</?XML>", txt)
# [1] 46 101 109 719
txt[grep("</?XML>", txt)]
# [1] "<XML>" "</XML>" "<XML>" "</XML>"
A brief inspection of the file informed that grep, suggesting that an XML file started and stopped, and then another started/stopped. If we stay within that, we can extract most of the data with
library(xml2)
first <- as_list(read_xml(paste(txt[47:100], collapse = "")))
str(first)
# List of 1
# $ edgarSubmission:List of 2
# ..$ headerData:List of 2
# .. ..$ submissionType:List of 1
# .. .. ..$ : chr "13F-HR"
# .. ..$ filerInfo :List of 4
# .. .. ..$ liveTestFlag :List of 1
# .. .. .. ..$ : chr "LIVE"
# .. .. ..$ flags :List of 3
# .. .. .. ..$ confirmingCopyFlag :List of 1
# .. .. .. .. ..$ : chr "false"
# .. .. .. ..$ returnCopyFlag :List of 1
# .. .. .. .. ..$ : chr "true"
# .. .. .. ..$ overrideInternetFlag:List of 1
# .. .. .. .. ..$ : chr "false"
# .. .. ..$ filer :List of 1
# .. .. .. ..$ credentials:List of 2
# .. .. .. .. ..$ cik:List of 1
# .. .. .. .. .. ..$ : chr "0001061165"
# .. .. .. .. ..$ ccc:List of 1
# .. .. .. .. .. ..$ : chr "XXXXXXXX"
# .. .. ..$ periodOfReport:List of 1
# .. .. .. ..$ : chr "03-31-2021"
# ..$ formData :List of 3
and the second batch:
second <- as_list(read_xml(paste(txt[110:718], collapse = "")))
str(second)
# List of 1
# $ informationTable:List of 38
# ..$ infoTable:List of 7
# .. ..$ nameOfIssuer :List of 1
# .. .. ..$ : chr "ADOBE SYSTEMS INCORPORATED"
# .. ..$ titleOfClass :List of 1
# .. .. ..$ : chr "COM"
# .. ..$ cusip :List of 1
# .. .. ..$ : chr "00724F101"
# .. ..$ value :List of 1
# .. .. ..$ : chr "1246613"
# .. ..$ shrsOrPrnAmt :List of 2
# .. .. ..$ sshPrnamt :List of 1
# .. .. .. ..$ : chr "2622406"
# .. .. ..$ sshPrnamtType:List of 1
# .. .. .. ..$ : chr "SH"
# .. ..$ investmentDiscretion:List of 1
# .. .. ..$ : chr "SOLE"
# .. ..$ votingAuthority :List of 3
# .. .. ..$ Sole :List of 1
# .. .. .. ..$ : chr "2622406"
# .. .. ..$ Shared:List of 1
# .. .. .. ..$ : chr "0"
# .. .. ..$ None :List of 1
# .. .. .. ..$ : chr "0"
# ..$ infoTable:List of 7
I'm not certain offhand how to extract the front-matter, I hope this is a good enough start.

handling lists in lists to Dataframe in R

I´m new and i have some problems handling list and transform to dataframe
I have a list "ddt"
str(ddt)
List of 4
$ id : chr "18136"
$ comments.data:List of 3
..$ :List of 3
.. ..$ timestamp: chr "2020-05-25T16:17:32+0000"
.. ..$ text : chr "Mocaaa"
.. ..$ id : chr "18096"
..$ :List of 3
.. ..$ timestamp: chr "2020-05-25T16:00:00+0000"
.. ..$ text : chr "Capucchino"
.. ..$ id : chr "17846"
..$ :List of 3
.. ..$ timestamp: chr "2020-05-25T14:42:53+0000"
.. ..$ text : chr "Mocachino"
.. ..$ id : chr "18037"
$ id : chr "17920"
$ comments.data:List of 1
..$ :List of 3
.. ..$ timestamp: chr "2020-05-24T15:31:30+0000"
.. ..$ text : chr "Hello"
.. ..$ id : chr "18054"
And i need this result
id timestamp text id2
1 18136 2020-05-25T16:17:32+0000 Mocaaa 18096
2 18136 2020-05-25T16:00:00+0000 Capucchino 17846
3 18136 2020-05-25T14:42:53+0000 Mocachino 18037
4 17920 2020-05-24T15:31:30+0000 Hello 18054
I think this can be done well with data.table.
set.seed(42)
df <- replicate(2, list(id = sample(1e5, 1), comments = replicate(3, list(tm = as.character(Sys.time() + sample(10, 1)), text = sample(LETTERS, 1), id = sample(1e5, 1)), simplify = FALSE)), simplify = FALSE)
str(df)
# List of 2
# $ :List of 2
# ..$ id : int 91481
# ..$ comments:List of 3
# .. ..$ :List of 3
# .. .. ..$ tm : chr "2020-05-26 14:44:08"
# .. .. ..$ text: chr "H"
# .. .. ..$ id : int 83045
# .. ..$ :List of 3
# .. .. ..$ tm : chr "2020-05-26 14:44:05"
# .. .. ..$ text: chr "N"
# .. .. ..$ id : int 73659
# .. ..$ :List of 3
# .. .. ..$ tm : chr "2020-05-26 14:44:00"
# .. .. ..$ text: chr "R"
# .. .. ..$ id : int 70507
# $ :List of 2
# ..$ id : int 45775
# ..$ comments:List of 3
# .. ..$ :List of 3
# .. .. ..$ tm : chr "2020-05-26 14:44:06"
# .. .. ..$ text: chr "Y"
# .. .. ..$ id : int 25543
# .. ..$ :List of 3
# .. .. ..$ tm : chr "2020-05-26 14:44:03"
# .. .. ..$ text: chr "Y"
# .. .. ..$ id : int 97823
# .. ..$ :List of 3
# .. .. ..$ tm : chr "2020-05-26 14:44:00"
# .. .. ..$ text: chr "M"
# .. .. ..$ id : int 56034
One thing we'll have to contend with is that you have id on the top-level as well as internally within each list.
library(data.table)
library(magrittr) # for %>%, demonstrative only, can be done without
data.table::rbindlist(df) %>%
.[, comments := lapply(comments, as.data.table) ] %>%
# we have a duplicate name 'id', rename in the inner ones
.[, comments := lapply(comments, setnames, "id", "innerid") ] %>%
.[, unlist(comments, recursive = FALSE), by = seq_len(nrow(.)) ]
# seq_len tm text innerid
# 1: 1 2020-05-26 14:49:21 H 83045
# 2: 2 2020-05-26 14:49:18 N 73659
# 3: 3 2020-05-26 14:49:13 R 70507
# 4: 4 2020-05-26 14:49:19 Y 25543
# 5: 5 2020-05-26 14:49:16 Y 97823
# 6: 6 2020-05-26 14:49:13 M 56034
I suspect that the by=seq_len(nrow(.)) is not going to scale well to larger data. Since Rdatatable/data.table#3672 is still open, an alternative is to replace the last line (including unlist and seq_len) with just %>% tidyr::unnest(comments). I suspect that the combination of data.table and tidyr is at times contentious, I suggest that this non-partisan approach capitalizes on the strengths of both.
The structure seems to look just like a java script object.
You could do:
library(jsonlite)
library(tidyr)
unnest(unnest(fromJSON(toJSON(df))))
# A tibble: 6 x 4
id tm text id1
<int> <chr> <chr> <int>
1 92345 2020-05-26 14:53:53 X 6730
2 92345 2020-05-26 14:53:56 Q 92812
3 92345 2020-05-26 14:53:56 D 25304
4 9847 2020-05-26 14:53:56 E 82734
5 9847 2020-05-26 14:54:01 I 75079
6 9847 2020-05-26 14:54:02 H 89373

Insert elements into a list based on depth and `if` conditions using modify_depth and modify_if (purrr)

I'm learning some purrr commands, specifically the modify_* family of functions. I'm attemping to add price bins to items found in a grocery store (see below for my attempt and error code).
library(tidyverse)
Data
easybuy <- list(
"5520 N Division St, Spokane, WA 99208, USA",
list("bananas", "oranges"),
canned = list("olives", "fish", "jam"),
list("pork", "beef"),
list("hammer", "tape")
) %>%
map(list) %>%
# name the sublists
set_names(c("address",
"fruit",
"canned",
"meat",
"other")) %>%
# except for address, names the sublists "items"
modify_at(c(2:5), ~ set_names(.x, "items"))
Take a peek:
glimpse(easybuy)
#> List of 5
#> $ address:List of 1
#> ..$ : chr "5520 N Division St, Spokane, WA 99208, USA"
#> $ fruit :List of 1
#> ..$ items:List of 2
#> .. ..$ : chr "bananas"
#> .. ..$ : chr "oranges"
#> $ canned :List of 1
#> ..$ items:List of 3
#> .. ..$ : chr "olives"
#> .. ..$ : chr "fish"
#> .. ..$ : chr "jam"
#> $ meat :List of 1
#> ..$ items:List of 2
#> .. ..$ : chr "pork"
#> .. ..$ : chr "beef"
#> $ other :List of 1
#> ..$ items:List of 2
#> .. ..$ : chr "hammer"
#> .. ..$ : chr "tape"
My Attempt
Idea: go in a depth of two, and look for "items", append a "price". I'm not sure if I can nest the modify functions like this.
easybuy %>%
modify_depth(2, ~ modify_at(., "items", ~ append("price")))
#> Error: character indexing requires a named object
Desired
I would like the following structure (note the addition of "price" under each item):
List of 5
$ address:List of 1
..$ : chr "5520 N Division St, Spokane, WA 99208, USA"
$ fruit :List of 1
..$ items:List of 2
.. ..$ :List of 2
.. .. ..$ : chr "bananas"
.. .. ..$ : chr "price"
.. ..$ :List of 2
.. .. ..$ : chr "oranges"
.. .. ..$ : chr "price"
$ canned :List of 1
..$ items:List of 3
.. ..$ :List of 2
.. .. ..$ : chr "olives"
.. .. ..$ : chr "price"
.. ..$ :List of 2
.. .. ..$ : chr "fish"
.. .. ..$ : chr "price"
.. ..$ :List of 2
.. .. ..$ : chr "jam"
.. .. ..$ : chr "price"
$ meat :List of 1
..$ items:List of 2
.. ..$ :List of 2
.. .. ..$ : chr "pork"
.. .. ..$ : chr "price"
.. ..$ :List of 2
.. .. ..$ : chr "beef"
.. .. ..$ : chr "price"
$ other :List of 1
..$ items:List of 2
.. ..$ :List of 2
.. .. ..$ : chr "hammer"
.. .. ..$ : chr "price"
.. ..$ :List of 2
.. .. ..$ : chr "tape"
.. .. ..$ : chr "price"
This seems working. The map_if and function(x) !is.null(names(x)) make sure the change only happen if the name of the item is not NULL. ~modify_depth(.x, 2, function(y) list(y, "price")) creates the list you need.
library(tidyverse)
easybuy2 <- easybuy %>%
map_if(function(x) !is.null(names(x)),
~modify_depth(.x, 2, function(y) list(y, "price")))
Here is how the second item looks like.
easybuy2[[2]][[1]]
# [[1]]
# [[1]][[1]]
# [1] "bananas"
#
# [[1]][[2]]
# [1] "price"
#
#
# [[2]]
# [[2]][[1]]
# [1] "oranges"
#
# [[2]][[2]]
# [1] "price"
Or this also works.
easybuy3 <- easybuy %>%
modify_at(2:5, ~modify_depth(.x, 2, function(y) list(y, "price")))
identical(easybuy2, easybuy3)
# [1] TRUE
Update
easybuy4 <- easybuy %>%
map_if(function(x){
name <- names(x)
if(is.null(name)){
return(FALSE)
} else {
return(name %in% "items")
}
},
~modify_depth(.x, 2, function(y) list(y, "price")))
identical(easybuy2, easybuy4)
# [1] TRUE

Convert Nested List into data.frame with different column length

I was trying to convert below nested list into data.frame but without luck. There are a few complications, mainly the column "results" of position 1 is inconsistent with position 2, as there is no result in position 2.
item length inconsistent across different positions
[[1]]
[[1]]$html_attributions
list()
[[1]]$results
geometry.location.lat geometry.location.lng
1 25.66544 -100.4354
id place_id
1 6ce0a030663144c8e992cbce51eb00479ef7db89 ChIJVy7b7FW9YoYRdaH2I_gOJIk
reference
1 CmRSAAAATdtVfB4Tz1aQ8GhGaw4-nRJ5lZlVNgiOR3ciF4QjmYC56bn6b7omWh1SJEWWqQQEFNXxGZndgEwSgl8sRCOtdF8aXpngUY878Q__yH4in8EMZMCIqSHLARqNgGlV4mKgEhDlvkHLXLiBW4F_KQVT83jIGhS5DJipk6PAnpPDXP2p-4X5NPuG9w
[[1]]$status
[1] "OK"
[[2]]
[[2]]$html_attributions
list()
[[2]]$results
list()
[[2]]$status
[1] "ZERO_RESULTS"
I tried the following codes but they aint' working.
#1
m1 <- do.call(rbind, lapply(myDataFrames, function(y) do.call(rbind, y)))
relist(m1, skeleton = myDataFrames)
#2
relist(matrix(unlist(myDataFrames), ncol = 4, byrow = T), skeleton = myDataFrames)
#3
library(data.table)
df<-rbindlist(myDataFrames, idcol = "index")
df<-rbindlist(myDataFrames, fill=TRUE)
#4
myDataFrame <- do.call(rbind.data.frame, c(myDataFrames, list(stringsAsFactors = FALSE)))
I think I have enough of the original JSON to be able to create a reproducible example:
okjson <- '{"html_attributions":[],"results":[{"geometry":{"location":{"lat":25.66544,"lon":-100.4354},"id":"foo","place_id":"quux"}}],"status":"OK"}'
emptyjson <- '{"html_attributions":[],"results":[],"status":"ZERO_RESULTS"}'
jsons <- list(okjson, emptyjson, okjson)
From here, I'll step (slowly) through the process. I've included much of the intermediate structure for reproducibility, I apologize for the verbosity. This can easily be grouped together and/or put within a magrittr pipeline.
lists <- lapply(jsons, jsonlite::fromJSON)
str(lists)
# List of 3
# $ :List of 3
# ..$ html_attributions: list()
# ..$ results :'data.frame': 1 obs. of 1 variable:
# .. ..$ geometry:'data.frame': 1 obs. of 3 variables:
# .. .. ..$ location:'data.frame': 1 obs. of 2 variables:
# .. .. .. ..$ lat: num 25.7
# .. .. .. ..$ lon: num -100
# .. .. ..$ id : chr "foo"
# .. .. ..$ place_id: chr "quux"
# ..$ status : chr "OK"
# $ :List of 3
# ..$ html_attributions: list()
# ..$ results : list()
# ..$ status : chr "ZERO_RESULTS"
# $ :List of 3
# ..$ html_attributions: list()
# ..$ results :'data.frame': 1 obs. of 1 variable:
# .. ..$ geometry:'data.frame': 1 obs. of 3 variables:
# .. .. ..$ location:'data.frame': 1 obs. of 2 variables:
# .. .. .. ..$ lat: num 25.7
# .. .. .. ..$ lon: num -100
# .. .. ..$ id : chr "foo"
# .. .. ..$ place_id: chr "quux"
# ..$ status : chr "OK"
goodlists <- Filter(function(a) "results" %in% names(a) && length(a$results) > 0, lists)
goodresults <- lapply(goodlists, `[[`, "results")
str(goodresults)
# List of 2
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ geometry:'data.frame': 1 obs. of 3 variables:
# .. ..$ location:'data.frame': 1 obs. of 2 variables:
# .. .. ..$ lat: num 25.7
# .. .. ..$ lon: num -100
# .. ..$ id : chr "foo"
# .. ..$ place_id: chr "quux"
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ geometry:'data.frame': 1 obs. of 3 variables:
# .. ..$ location:'data.frame': 1 obs. of 2 variables:
# .. .. ..$ lat: num 25.7
# .. .. ..$ lon: num -100
# .. ..$ id : chr "foo"
# .. ..$ place_id: chr "quux"
goodresultsdf <- lapply(goodresults, function(a) jsonlite::flatten(as.data.frame(a)))
str(goodresultsdf)
# List of 2
# $ :'data.frame': 1 obs. of 4 variables:
# ..$ geometry.id : chr "foo"
# ..$ geometry.place_id : chr "quux"
# ..$ geometry.location.lat: num 25.7
# ..$ geometry.location.lon: num -100
# $ :'data.frame': 1 obs. of 4 variables:
# ..$ geometry.id : chr "foo"
# ..$ geometry.place_id : chr "quux"
# ..$ geometry.location.lat: num 25.7
# ..$ geometry.location.lon: num -100
We now have a list-of-data.frames, a good place to be.
do.call(rbind.data.frame, c(goodresultsdf, stringsAsFactors = FALSE))
# geometry.id geometry.place_id geometry.location.lat geometry.location.lon
# 1 foo quux 25.66544 -100.4354
# 2 foo quux 25.66544 -100.4354

R - Register the Errors as a new Variable/Vector/Column

I am learning the basics of R and I am currently using tryCatch to continue a loop even when an error is encountered. It basically looks like this:
for (variableloop in (1:10000)){
tryCatch({
My function/ formula goes here
},error=function(e){cat("ERROR :",conditionMessage(e), "\n")})
}
I was wondering if there is a command to save up the list of cases where the loop provided an error.
Thank you very much for your time.
What you want is to for each call to your function to return both the result and the error, where exactly one of the two is empty. Something like this (using base R):
# bigger loop than this ...
input <- 1:5
myfunc <- function(ign) if ( (x <- runif(1)) < 0.2) stop(paste0("some error: ", x)) else x
set.seed(2)
ret <- lapply(input, function(i) {
tryCatch(list(result = myfunc(i), error = NA),
error = function(e) list(result = NA, error = e))
})
str(ret)
# List of 5
# $ :List of 2
# ..$ result: logi NA
# ..$ error :List of 2
# .. ..$ message: chr "some error: 0.18488225992769"
# .. ..$ call : language myfunc(i)
# .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
# $ :List of 2
# ..$ result: num 0.702
# ..$ error : logi NA
# $ :List of 2
# ..$ result: num 0.573
# ..$ error : logi NA
# $ :List of 2
# ..$ result: logi NA
# ..$ error :List of 2
# .. ..$ message: chr "some error: 0.168051920365542"
# .. ..$ call : language myfunc(i)
# .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
# $ :List of 2
# ..$ result: num 0.944
# ..$ error : logi NA
You can access just the (possibly empty) errors with:
str(lapply(ret, `[[`, "error"))
# List of 5
# $ :List of 2
# ..$ message: chr "some error: 0.18488225992769"
# ..$ call : language myfunc(i)
# ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
# $ : logi NA
# $ : logi NA
# $ :List of 2
# ..$ message: chr "some error: 0.168051920365542"
# ..$ call : language myfunc(i)
# ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
# $ : logi NA
You can also use the purrr package:
set.seed(2)
ret <- lapply(input, function(i) {
purrr::safely(myfunc)(i)
})
str(lapply(ret, `[[`, "error"))
# List of 5
# $ :List of 2
# ..$ message: chr "some error: 0.18488225992769"
# ..$ call : language .f(...)
# ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
# $ : NULL
# $ : NULL
# $ :List of 2
# ..$ message: chr "some error: 0.168051920365542"
# ..$ call : language .f(...)
# ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
# $ : NULL

Resources