extract content of large lists through a loop

extract content of large lists through a loop - r

i have about 30 padronized large list and i want to extract elements that are organized in the same way in all of them.
> df[[2]][[1]][[4]][[1]][[1]][[1]][[1]][[3]][[1]]
[1] "Non-histaminic angioedema"
> df[[2]][[1]][[4]][[1]][[1]][[2]][[1]][[3]][[1]]
[1] "Rare urticaria"
> df[[2]][[1]][[4]][[1]][[1]][[3]][[1]][[3]][[1]]
[1] "Rare allergic respiratory disease"
i want a loop to extract all these info to a data.frame. I tried to use lapply, but i'm having trouble with the right function to use.
i never used the apply family, so i would be very grateful with some tips
-----------------edit
str(df)
$ Availability:List of 1
..$ Licence:List of 3
.. ..$ FullName :List of 2
.. .. ..$ text : chr "Creative Commons Attribution 4.0 International"
.. .. ..$ .attrs: Named chr "en"
.. .. .. ..- attr(*, "names")= chr "lang"
.. ..$ ShortIdentifier: chr "CC-BY-4.0"
.. ..$ LegalCode : chr "https://creativecommons.org/licenses/by/4.0/legalcode"
$ DisorderList:List of 2
..$ Disorder:List of 5
.. ..$ OrphaNumber : chr "98050"
.. ..$ ExpertLink :List of 2
.. .. ..$ text : chr "http://www.orpha.net/consor/cgi-bin/OC_Exp.php?lng=en&Expert=98050"
.. .. ..$ .attrs: Named chr "en"
.. .. .. ..- attr(*, "names")= chr "lang"
.. ..$ Name :List of 2
.. .. ..$ text : chr "Rare allergic disease"
.. .. ..$ .attrs: Named chr "en"
.. .. .. ..- attr(*, "names")= chr "lang"
.. ..$ ClassificationNodeList:List of 2
.. .. ..$ ClassificationNode:List of 1
.. .. .. ..$ ClassificationNodeChildList:List of 5
.. .. .. .. ..$ ClassificationNode:List of 2
.. .. .. .. .. ..$ Disorder :List of 4
.. .. .. .. .. .. ..$ OrphaNumber: chr "658"
.. .. .. .. .. .. ..$ ExpertLink :List of 2
.. .. .. .. .. .. .. ..$ text : chr "http://www.orpha.net/consor/cgi-bin/OC_Exp.php?lng=en&Expert=658"
.. .. .. .. .. .. .. ..$ .attrs: Named chr "en"
.. .. .. .. .. .. .. .. ..- attr(*, "names")= chr "lang"
.. .. .. .. .. .. ..$ Name :List of 2
.. .. .. .. .. .. .. ..$ text : chr "Non-histaminic angioedema"
.. .. .. .. .. .. .. ..$ .attrs: Named chr "en"
.. .. .. .. .. .. .. .. ..- attr(*, "names")= chr "lang"
.. .. .. .. .. .. ..$ .attrs : Named chr "8618"
.. .. .. .. .. .. .. ..- attr(*, "names")= chr "id"
.. .. .. .. .. ..$ ClassificationNodeChildList:List of 3
.. .. .. .. .. .. ..$ ClassificationNode:List of 2
.. .. .. .. .. .. .. ..$ Disorder :List of 4
.. .. .. .. .. .. .. .. ..$ OrphaNumber: chr "91378"
.. .. .. .. .. .. .. .. ..$ ExpertLink :List of 2
.. .. .. .. .. .. .. .. .. ..$ text : chr "http://www.orpha.net/consor/cgi-bin/OC_Exp.php?lng=en&Expert=91378"
.. .. .. .. .. .. .. .. .. ..$ .attrs: Named chr "en"
.. .. .. .. .. .. .. .. .. .. ..- attr(*, "names")= chr "lang"
.. .. .. .. .. .. .. .. ..$ Name :List of 2
.. .. .. .. .. .. .. .. .. ..$ text : chr "Hereditary angioedema"
.. .. .. .. .. .. .. .. .. ..$ .attrs: Named chr "en"
.. .. .. .. .. .. .. .. .. .. ..- attr(*, "names")= chr "lang"
.. .. .. .. .. .. .. .. ..$ .attrs : Named chr "12136"
.. .. .. .. .. .. .. .. .. ..- attr(*, "names")= chr "id"
.. .. .. .. .. .. .. ..$ ClassificationNodeChildList:List of 3
.. .. .. .. .. .. .. .. ..$ ClassificationNode:List of 2
.. .. .. .. .. .. .. .. .. ..$ Disorder :List of 4
.. .. .. .. .. .. .. .. .. .. ..$ OrphaNumber: chr "528623"
.. .. .. .. .. .. .. .. .. .. ..$ ExpertLink :List of 2
.. .. .. .. .. .. .. .. .. .. .. ..$ text : chr "http://www.orpha.net/consor/cgi-bin/OC_Exp.php?lng=en&Expert=528623"
this represents the structure of one of my lists, and i want to extract only what is contained in
df$DisorderList$Disorder$ClassificationNodeList$ClassificationNode$ClassificationNodeChildList$ClassificationNode$Disorder$Name$text
as you can see, this structure repeats along the file (in this case, twice, because I paste only a small part)
head(df$DisorderList$Disorder$ClassificationNodeList$ClassificationNode$ClassificationNodeChildList[1]$ClassificationNode$Disorder$Name$text)
[1] "Non-histaminic angioedema"
head(df$DisorderList$Disorder$ClassificationNodeList$ClassificationNode$ClassificationNodeChildList[2]$ClassificationNode$Disorder$Name$text)
[1] "Hereditary angioedema"
I want to extract these information in each of the 30 lists I have

We can use sapply to loop over the index of the 6th nested list and extract that component (assuming all other index are constant)
sapply(1:30, function(i) df[[2]][[1]][[4]][[1]][[1]][[i]][[1]][[3]][[1]])
It is better to check the length of the list to make this more dynamic
l1 <- length(df[[2]][[1]][[4]][[1]][[1]][[i]])
sapply(seq_len(l1), function(i) df[[2]][[1]][[4]][[1]][[1]][[i]][[1]][[3]][[1]])
Or if we use the OP's updated post with names
sapply(seq_len(l1), function(i)
df$DisorderList$Disorder$ClassificationNodeList$ClassificationNode$ClassificationNodeChildList[i]$ClassificationNode$Disorder$Name$text)
Using a reproducible example
lapply(1:3, function(i) df[[2]][[1]][[4]][[1]][[1]][[i]][[1]][[2]][[1]])
#[[1]]
#[1] 1 2 3
#[[2]]
#[1] 1 2 3
#[[3]]
#[1] 1 2 3
Or with pluck
library(purrr)
map(1:3, ~ pluck(df, 2, 1, 4, 1, 1, .x, 1, 2, 1))
#[[1]]
#[1] 1 2 3
#[[2]]
#[1] 1 2 3
#[[3]]
#[1] 1 2 3
data
df <- replicate(3, replicate(2, replicate(4, replicate(4, replicate(3, replicate(5, list( list( list( 1:3, 1:5), list( 1:3, 1:5))), simplify = FALSE), simplify = FALSE), simplify = FALSE), simplify = FALSE), simplify = FALSE), simplify = FALSE)

Related

Plotly in R: How to reference and extract figure values?

I want to know how can I access, extract, and reference values from a plotly figure in R.
Consider, for example, the Sankey diagram from plotly's own site of which there is an abbreviated version here:
library(plotly)
fig <- plot_ly(
type = "sankey",
node = list(
label = c("A1", "A2", "B1", "B2", "C1", "C2"),
color = c("blue", "blue", "blue", "blue", "blue", "blue"),
line = list()
),
link = list(
source = c(0,1,0,2,3,3),
target = c(2,3,3,4,4,5),
value = c(8,4,2,8,4,2)
)
)
fig
If I do View(fig) in Rstudio, a new tab opens titled . (I don't know why this instead of 'fig'). In this tab I can go to x > visdat > 'strig of letters and numbers that is a function?' > attrs > node > x (as shown bellow).
Here all the x coordinates for the Sankey nodes appear.
I want to access these values so I can use them somewhere else. How do I do this? If I click on the right side of the Rsutudio tab to copy the code to console I get:
environment(.[["x"]][["visdat"]][["484c3ec36899"]])[["attrs"]][["node"]][["x"]]
which obviously doesn't work as there is no object named ..
In this case I have tried fig$x$visdat$`484c3ec36899`() but I cant do fig$x$visdat$`484c3ec36899`()$attr, and I don't know what else to do.
So, how can I access any value from a plotly object? Any documentation referencing this topic would also be helpful.
Thanks.

You can find the documentation of the data structure of plotly in R here: https://plotly.com/r/figure-structure/
To check the data structure you can use str(fig):
List of 8
$ x :List of 6
..$ visdat :List of 1
.. ..$ a3b8795a4:function ()
..$ cur_data: chr "a3b8795a4"
..$ attrs :List of 1
.. ..$ a3b8795a4:List of 6
.. .. ..$ node :List of 3
.. .. .. ..$ label: chr [1:6] "A1" "A2" "B1" "B2" ...
.. .. .. ..$ color: chr [1:6] "blue" "blue" "blue" "blue" ...
.. .. .. ..$ line : list()
.. .. ..$ link :List of 3
.. .. .. ..$ source: num [1:6] 0 1 0 2 3 3
.. .. .. ..$ target: num [1:6] 2 3 3 4 4 5
.. .. .. ..$ value : num [1:6] 8 4 2 8 4 2
.. .. ..$ alpha_stroke: num 1
.. .. ..$ sizes : num [1:2] 10 100
.. .. ..$ spans : num [1:2] 1 20
.. .. ..$ type : chr "sankey"
..$ layout :List of 3
.. ..$ width : NULL
.. ..$ height: NULL
.. ..$ margin:List of 4
.. .. ..$ b: num 40
.. .. ..$ l: num 60
.. .. ..$ t: num 25
.. .. ..$ r: num 10
..$ source : chr "A"
..$ config :List of 1
.. ..$ showSendToCloud: logi FALSE
..- attr(*, "TOJSON_FUNC")=function (x, ...)
$ width : NULL
$ height : NULL
$ sizingPolicy :List of 6
..$ defaultWidth : chr "100%"
..$ defaultHeight: num 400
..$ padding : NULL
..$ viewer :List of 6
.. ..$ defaultWidth : NULL
.. ..$ defaultHeight: NULL
.. ..$ padding : NULL
.. ..$ fill : logi TRUE
.. ..$ suppress : logi FALSE
.. ..$ paneHeight : NULL
..$ browser :List of 5
.. ..$ defaultWidth : NULL
.. ..$ defaultHeight: NULL
.. ..$ padding : NULL
.. ..$ fill : logi TRUE
.. ..$ external : logi FALSE
..$ knitr :List of 3
.. ..$ defaultWidth : NULL
.. ..$ defaultHeight: NULL
.. ..$ figure : logi TRUE
$ dependencies :List of 5
..$ :List of 10
.. ..$ name : chr "typedarray"
.. ..$ version : chr "0.1"
.. ..$ src :List of 1
.. .. ..$ file: chr "htmlwidgets/lib/typedarray"
.. ..$ meta : NULL
.. ..$ script : chr "typedarray.min.js"
.. ..$ stylesheet: NULL
.. ..$ head : NULL
.. ..$ attachment: NULL
.. ..$ package : chr "plotly"
.. ..$ all_files : logi FALSE
.. ..- attr(*, "class")= chr "html_dependency"
..$ :List of 10
.. ..$ name : chr "jquery"
.. ..$ version : chr "1.11.3"
.. ..$ src :List of 1
.. .. ..$ file: chr "lib/jquery"
.. ..$ meta : NULL
.. ..$ script : chr "jquery.min.js"
.. ..$ stylesheet: NULL
.. ..$ head : NULL
.. ..$ attachment: NULL
.. ..$ package : chr "crosstalk"
.. ..$ all_files : logi TRUE
.. ..- attr(*, "class")= chr "html_dependency"
..$ :List of 10
.. ..$ name : chr "crosstalk"
.. ..$ version : chr "1.1.0.1"
.. ..$ src :List of 1
.. .. ..$ file: chr "www"
.. ..$ meta : NULL
.. ..$ script : chr "js/crosstalk.min.js"
.. ..$ stylesheet: chr "css/crosstalk.css"
.. ..$ head : NULL
.. ..$ attachment: NULL
.. ..$ package : chr "crosstalk"
.. ..$ all_files : logi TRUE
.. ..- attr(*, "class")= chr "html_dependency"
..$ :List of 10
.. ..$ name : chr "plotly-htmlwidgets-css"
.. ..$ version : chr "1.52.2"
.. ..$ src :List of 1
.. .. ..$ file: chr "htmlwidgets/lib/plotlyjs"
.. ..$ meta : NULL
.. ..$ script : NULL
.. ..$ stylesheet: chr "plotly-htmlwidgets.css"
.. ..$ head : NULL
.. ..$ attachment: NULL
.. ..$ package : chr "plotly"
.. ..$ all_files : logi FALSE
.. ..- attr(*, "class")= chr "html_dependency"
..$ :List of 10
.. ..$ name : chr "plotly-main"
.. ..$ version : chr "1.52.2"
.. ..$ src :List of 1
.. .. ..$ file: chr "htmlwidgets/lib/plotlyjs"
.. ..$ meta : NULL
.. ..$ script : chr "plotly-latest.min.js"
.. ..$ stylesheet: NULL
.. ..$ head : NULL
.. ..$ attachment: NULL
.. ..$ package : chr "plotly"
.. ..$ all_files : logi FALSE
.. ..- attr(*, "class")= chr "html_dependency"
$ elementId : NULL
$ preRenderHook:function (p, registerFrames = TRUE)
$ jsHooks : list()
- attr(*, "class")= chr [1:2] "plotly" "htmlwidget"
- attr(*, "package")= chr "plotly"
You could extract the coordinates with:
unlist(fig$x$attrs)

R: Loaded tweets structure is untidy when str()

Differently from my collegue, after I load the tweets with R and I try to see the structure with str() the data appears in a messy way with a lot of dots, rather than being organized as a table, which is what happens with my collegue's computer, even if the codes are the same. I can't understand what is the problem, we have the same packages installed and the same R version.
library(rtweet)
library(ggplot2)
library(dplyr)
library(tibble)
library(tidytext)
library(stringr)
library(stringi)
library(igraph)
library(ggraph)
library(readr)
library(lubridate)
library(zoo)
appname <- ""
key <- ""
secret <- ""
twitter_token <- create_token( app = "", consumer_key = "", consumer_secret = "", access_token = "", access_secret = "")
tweets <- search_tweets(q = "#water + #climatechange", n = 10000, lang = "en", include_rts = FALSE)
str(tweets)
.. ..$ media :'data.frame': 1 obs. of 11 variables:
.. .. ..$ id : num 1.57e+18
.. .. ..$ id_str : chr "1573815153484759040"
.. .. ..$ indices :List of 1
.. .. .. ..$ :'data.frame': 1 obs. of 2 variables:
.. .. .. .. ..$ start: int 241
.. .. .. .. ..$ end : int 264
.. .. .. ..- attr(*, "class")= chr "AsIs"
.. .. ..$ media_url : chr "http://pbs.twimg.com/media/FddQiy2WAAAl59Q.jpg"
.. .. ..$ media_url_https: chr "https://pbs.twimg.com/media/FddQiy2WAAAl59Q.jpg"
.. .. ..$ url : chr "https
.. .. ..$ display_url : chr "pic.twitter.com/iFJTkF1S9S"
.. .. ..$ expanded_url : chr "https://twitter.com/TreeBanker/status/1573815156768968706/photo/1"
.. .. ..$ type : chr "photo"
.. .. ..$ sizes :List of 1
.. .. .. ..$ :'data.frame': 4 obs. of 4 variables:
.. .. .. .. ..$ w : int [1:4] 1096 680 150 1096
.. .. .. .. ..$ h : int [1:4] 733 455 150 733
.. .. .. .. ..$ resize: chr [1:4] "fit" "fit" "crop" "fit"
.. .. .. .. ..$ type : chr [1:4] "large" "small" "thumb" "medium"
.. .. ..$ ext_alt_text : logi NA
..$ :List of 5
.. ..$ media :'data.frame': 1 obs. of 11 variables:
.. .. ..$ id : num 1.57e+18
.. .. ..$ id_str : chr "1573815153484759040"
.. .. ..$ indices :List of 1
.. .. .. ..$ :'data.frame': 1 obs. of 2 variables:

Get data to be usable

I have been trying to get the data from this link to be usable
url <- "https://www.sec.gov/Archives/edgar/data/1061165/0001567619-21-010580.txt"
that should be the same information as the one on this link
https://www.sec.gov/Archives/edgar/data/1061165/000156761921010580/xslForm13F_X01/form13fInfoTable.xml
I have been able to download the file into a .txt, but can not get the data
Thanks

The file appears to be two nested XML files. We can extract each of the components into lists with this code:
txt <- readLines("https://www.sec.gov/Archives/edgar/data/1061165/0001567619-21-010580.txt")
grep("</?XML>", txt)
# [1] 46 101 109 719
txt[grep("</?XML>", txt)]
# [1] "<XML>" "</XML>" "<XML>" "</XML>"
A brief inspection of the file informed that grep, suggesting that an XML file started and stopped, and then another started/stopped. If we stay within that, we can extract most of the data with
library(xml2)
first <- as_list(read_xml(paste(txt[47:100], collapse = "")))
str(first)
# List of 1
# $ edgarSubmission:List of 2
# ..$ headerData:List of 2
# .. ..$ submissionType:List of 1
# .. .. ..$ : chr "13F-HR"
# .. ..$ filerInfo :List of 4
# .. .. ..$ liveTestFlag :List of 1
# .. .. .. ..$ : chr "LIVE"
# .. .. ..$ flags :List of 3
# .. .. .. ..$ confirmingCopyFlag :List of 1
# .. .. .. .. ..$ : chr "false"
# .. .. .. ..$ returnCopyFlag :List of 1
# .. .. .. .. ..$ : chr "true"
# .. .. .. ..$ overrideInternetFlag:List of 1
# .. .. .. .. ..$ : chr "false"
# .. .. ..$ filer :List of 1
# .. .. .. ..$ credentials:List of 2
# .. .. .. .. ..$ cik:List of 1
# .. .. .. .. .. ..$ : chr "0001061165"
# .. .. .. .. ..$ ccc:List of 1
# .. .. .. .. .. ..$ : chr "XXXXXXXX"
# .. .. ..$ periodOfReport:List of 1
# .. .. .. ..$ : chr "03-31-2021"
# ..$ formData :List of 3
and the second batch:
second <- as_list(read_xml(paste(txt[110:718], collapse = "")))
str(second)
# List of 1
# $ informationTable:List of 38
# ..$ infoTable:List of 7
# .. ..$ nameOfIssuer :List of 1
# .. .. ..$ : chr "ADOBE SYSTEMS INCORPORATED"
# .. ..$ titleOfClass :List of 1
# .. .. ..$ : chr "COM"
# .. ..$ cusip :List of 1
# .. .. ..$ : chr "00724F101"
# .. ..$ value :List of 1
# .. .. ..$ : chr "1246613"
# .. ..$ shrsOrPrnAmt :List of 2
# .. .. ..$ sshPrnamt :List of 1
# .. .. .. ..$ : chr "2622406"
# .. .. ..$ sshPrnamtType:List of 1
# .. .. .. ..$ : chr "SH"
# .. ..$ investmentDiscretion:List of 1
# .. .. ..$ : chr "SOLE"
# .. ..$ votingAuthority :List of 3
# .. .. ..$ Sole :List of 1
# .. .. .. ..$ : chr "2622406"
# .. .. ..$ Shared:List of 1
# .. .. .. ..$ : chr "0"
# .. .. ..$ None :List of 1
# .. .. .. ..$ : chr "0"
# ..$ infoTable:List of 7
I'm not certain offhand how to extract the front-matter, I hope this is a good enough start.

Subset (list of lists) nested Lists

I am trying to subset thead/tbody without directly calling rowlist$td$list$item$table$thead or rowlist[[td]][[list]][[item]][[table]][[thead]]. This
unlist(rowlist, use.names=FALSE )[ grepl( "tbody", names(unlist(rowlist)))] serves my purpose except I need it as multiple rows (e.g. two tr's in tbody)(i can split it but seems counter intuitive .
I know there should be a better way to work with HTML/XML but this is got I got for now.
str(rowlist)
List of 1
$ td:List of 1
..$ list:List of 1
.. ..$ item:List of 1
.. .. ..$ table:List of 2
.. .. .. ..$ thead:List of 1
.. .. .. .. ..$ tr:List of 7
.. .. .. .. .. ..$ th:List of 1
.. .. .. .. .. .. ..$ : chr "Test"
.. .. .. .. .. ..$ th:List of 1
.. .. .. .. .. .. ..$ : chr "Outcome"
.. .. .. .. .. ..$ th:List of 1
.. .. .. .. .. .. ..$ : chr "Subset"
.. .. .. .. .. ..$ th:List of 1
.. .. .. .. .. .. ..$ : chr "Cups"
.. .. .. .. .. ..$ th:List of 1
.. .. .. .. .. .. ..$ : chr "Bowls"
.. .. .. .. .. ..$ th:List of 1
.. .. .. .. .. .. ..$ : chr "Plates"
.. .. .. .. .. ..$ th:List of 1
.. .. .. .. .. .. ..$ : chr "Jars"
.. .. .. ..$ tbody:List of 2
.. .. .. .. ..$ tr:List of 7
.. .. .. .. .. ..$ td:List of 1
.. .. .. .. .. .. ..$ : chr "test1"
.. .. .. .. .. ..$ td:List of 1
.. .. .. .. .. .. ..$ : chr "High"
.. .. .. .. .. ..$ td:List of 1
.. .. .. .. .. .. ..$ : chr "Low"
.. .. .. .. .. ..$ td:List of 1
.. .. .. .. .. .. ..$ : chr "Gold"
.. .. .. .. .. ..$ td:List of 1
.. .. .. .. .. .. ..$ : chr "Blue"
.. .. .. .. .. ..$ td:List of 1
.. .. .. .. .. .. ..$ : chr "Green"
.. .. .. .. .. ..$ td:List of 1
.. .. .. .. .. .. ..$ : chr "red"
.. .. .. .. .. ..- attr(*, "ID")= chr "id_511"
.. .. .. .. ..$ tr:List of 7
.. .. .. .. .. ..$ td:List of 1
.. .. .. .. .. .. ..$ : chr "test2"
.. .. .. .. .. ..$ td:List of 1
.. .. .. .. .. .. ..$ : chr "Low"
.. .. .. .. .. ..$ td:List of 1
.. .. .. .. .. .. ..$ : chr "High"
.. .. .. .. .. ..$ td:List of 1
.. .. .. .. .. .. ..$ : chr "Pink"
.. .. .. .. .. ..$ td:List of 1
.. .. .. .. .. .. ..$ : chr "Blue"
.. .. .. .. .. ..$ td:List of 1
.. .. .. .. .. .. ..$ : chr "Purple"
.. .. .. .. .. ..$ td: list()
.. .. .. .. .. ..- attr(*, "ID")= chr "id_512"
.. ..- attr(*, "styleCode")= chr "none"
List looks like this
rowlist<-list(td = structure(list(list = structure(list(item = list(table = list(
thead = list(tr = list(
th = list("Test"), th = list("Outcome"), th = list("Set"), th = list("Cups"), th = list("Bowls"), th = list( "Plates"), th = list("Jars"))),
tbody = list(tr = structure(
list(td = list("test1"), td = list("High"), td = list("Low"), td = list("Gold"), td = list("Blue"), td = list("Green"), td = list("Red")), ID = "id_511"),
tr = structure(
list(td = list("test2"), td = list("Low"), td = list("High"), td = list("Pink"), td = list("Blue"), td = list("Purple"), td = list()), ID = "id_512"))))), styleCode = "none")), colspan = "20"))

If the object has to be handled as a nested list, one approach is to use rrapply in the rrapply-package (extension of base rapply):
library(rrapply) ## v1.2.1
out <- rrapply(rowlist,
classes = "list",
condition = function(x, .xname) .xname %in% c("thead", "tbody"),
how = "flatten")
str(out, list.len = 2)
#> List of 2
#> $ thead:List of 1
#> ..$ tr:List of 7
#> .. ..$ th:List of 1
#> .. .. ..$ : chr "Test"
#> .. ..$ th:List of 1
#> .. .. ..$ : chr "Outcome"
#> .. .. [list output truncated]
#> $ tbody:List of 2
#> ..$ tr:List of 7
#> .. ..$ td:List of 1
#> .. .. ..$ : chr "test1"
#> .. ..$ td:List of 1
#> .. .. ..$ : chr "High"
#> .. .. [list output truncated]
#> .. ..- attr(*, "ID")= chr "id_511"
#> ..$ tr:List of 7
#> .. ..$ td:List of 1
#> .. .. ..$ : chr "test2"
#> .. ..$ td:List of 1
#> .. .. ..$ : chr "Low"
#> .. .. [list output truncated]
#> .. ..- attr(*, "ID")= chr "id_512"
Here, the condition function returns only nodes with names thead or tbody, how = "flatten" returns the nodes in a flat list (how = "prune" would prune the nodes keeping the original list structure), and classes = "list" does not skip intermediate list nodes (as would be the case with base rapply()).

Using rvest to login into webpage with pop up sign in

I would like to login to a webpage with a pop up sign in window. This article logs into Stack Overflow, a webpage that has a visible login form. How can I use rvest to login into websites that don't have visible login forms? For example, the Washington Post's website has a sign in box on the top right of the page. Once clicked, a form appears where you can sign in.
library(rvest)
url <- 'https://www.rotary.org/myrotary/en'
url2 <- 'https://stackoverflow.com/users/login?ssrc=head&returnurl=http%3a%2f%2fstackoverflow.com%2f'
url3 <- 'https://www.washingtonpost.com/?noredirect=on'
If I get the structure of the forms on StackOverflow's login page,
pg_session <- html_session(url2)
html_form(pg_session) %>% str
List of 2
$ :List of 5
..$ name : chr "search"
..$ method : chr "GET"
..$ url : chr "/search"
..$ enctype: chr "form"
..$ fields :List of 2
.. ..$ q :List of 7
.. .. ..$ name : chr "q"
.. .. ..$ type : chr "text"
.. .. ..$ value : chr ""
.. .. ..$ checked : NULL
.. .. ..$ disabled: NULL
.. .. ..$ readonly: NULL
.. .. ..$ required: logi FALSE
.. .. ..- attr(*, "class")= chr "input"
.. ..$ <unnamed>:List of 7
.. .. ..$ name : chr "<unnamed>"
.. .. ..$ type : chr "submit"
.. .. ..$ value : NULL
.. .. ..$ checked : NULL
.. .. ..$ disabled: NULL
.. .. ..$ readonly: NULL
.. .. ..$ required: logi FALSE
.. .. ..- attr(*, "class")= chr "button"
.. ..- attr(*, "class")= chr "fields"
..- attr(*, "class")= chr "form"
$ :List of 5
..$ name : chr "login-form"
..$ method : chr "POST"
..$ url : chr "/users/login?ssrc=head&returnurl=http%3a%2f%2fstackoverflow.com%2f"
..$ enctype: chr "form"
..$ fields :List of 7
.. ..$ fkey :List of 7
.. .. ..$ name : chr "fkey"
.. .. ..$ type : chr "hidden"
.. .. ..$ value : chr "d5f8c65b7d92b368b4b58e43e59fd9d82cb4436bac4a6d430771d50b85e771aa"
.. .. ..$ checked : NULL
.. .. ..$ disabled: NULL
.. .. ..$ readonly: NULL
.. .. ..$ required: logi FALSE
.. .. ..- attr(*, "class")= chr "input"
.. ..$ ssrc :List of 7
.. .. ..$ name : chr "ssrc"
.. .. ..$ type : chr "hidden"
.. .. ..$ value : chr "head"
.. .. ..$ checked : NULL
.. .. ..$ disabled: NULL
.. .. ..$ readonly: NULL
.. .. ..$ required: logi FALSE
.. .. ..- attr(*, "class")= chr "input"
.. ..$ email :List of 7
.. .. ..$ name : chr "email"
.. .. ..$ type : chr "email"
.. .. ..$ value : NULL
.. .. ..$ checked : NULL
.. .. ..$ disabled: NULL
.. .. ..$ readonly: NULL
.. .. ..$ required: logi FALSE
.. .. ..- attr(*, "class")= chr "input"
.. ..$ password :List of 7
.. .. ..$ name : chr "password"
.. .. ..$ type : chr "password"
.. .. ..$ value : NULL
.. .. ..$ checked : NULL
.. .. ..$ disabled: NULL
.. .. ..$ readonly: NULL
.. .. ..$ required: logi FALSE
.. .. ..- attr(*, "class")= chr "input"
.. ..$ submit-button:List of 7
.. .. ..$ name : chr "submit-button"
.. .. ..$ type : NULL
.. .. ..$ value : NULL
.. .. ..$ checked : NULL
.. .. ..$ disabled: NULL
.. .. ..$ readonly: NULL
.. .. ..$ required: logi FALSE
.. .. ..- attr(*, "class")= chr "button"
.. ..$ oauth_version:List of 7
.. .. ..$ name : chr "oauth_version"
.. .. ..$ type : chr "hidden"
.. .. ..$ value : NULL
.. .. ..$ checked : NULL
.. .. ..$ disabled: NULL
.. .. ..$ readonly: NULL
.. .. ..$ required: logi FALSE
.. .. ..- attr(*, "class")= chr "input"
.. ..$ oauth_server :List of 7
.. .. ..$ name : chr "oauth_server"
.. .. ..$ type : chr "hidden"
.. .. ..$ value : NULL
.. .. ..$ checked : NULL
.. .. ..$ disabled: NULL
.. .. ..$ readonly: NULL
.. .. ..$ required: logi FALSE
.. .. ..- attr(*, "class")= chr "input"
.. ..- attr(*, "class")= chr "fields"
..- attr(*, "class")= chr "form"
, I can clearly locate where to fill out my email and password. However, I can't find it in the structure of the forms on the Washington Post's home page, which makes it difficult to call the form I need.
pg_session <- html_session(url3)
html_form(pg_session) %>% str
List of 1
$ :List of 5
..$ name : chr "search-form"
..$ method : chr "GET"
..$ url : chr "//www.washingtonpost.com/newssearch/"
..$ enctype: chr "form"
..$ fields :List of 1
.. ..$ query:List of 7
.. .. ..$ name : chr "query"
.. .. ..$ type : chr "text"
.. .. ..$ value : NULL
.. .. ..$ checked : NULL
.. .. ..$ disabled: NULL
.. .. ..$ readonly: NULL
.. .. ..$ required: logi FALSE
.. .. ..- attr(*, "class")= chr "input"
.. ..- attr(*, "class")= chr "fields"
..- attr(*, "class")= chr "form"
My particular case is to log in to this site, however the Washington Post's pop up log in seems similar enough that it would be the same procedure. How can I call these pop-up log ins?
*I am not too familiar with html, so if there are any better terms to use or ways to phrase it, feel free to correct me.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

extract content of large lists through a loop - r

Related

Plotly in R: How to reference and extract figure values?

R: Loaded tweets structure is untidy when str()

Get data to be usable

Subset (list of lists) nested Lists

Using rvest to login into webpage with pop up sign in

Categories

Resources