Scraping a website for governmental information with R

Scraping a website for governmental information with R - r

I'm scraping a Canadian federal website for a research project on online petitions. This is the whole website : http://www.oag-bvg.gc.ca/internet/English/pet_lp_e_940.html
I need to get those informations for each petition: hyperlink of the petition, number of the petition, title, issue(s), petitioner(s), date received, status, summary.
For instance in Aboriginal Affairs
[ http://www.oag-bvg.gc.ca/internet/English/pet_lpf_e_38167.html ], I started with the following code but I am blocked after finding the title with //h1.
library("rvest")
library("tm")
# tm -> making a corpus and saving it
library("lubridate")
BASE <- "http://www.oag-bvg.gc.ca/internet/English/pet_lp_e_940.html"
url <- paste0(BASE, 'http://www.oag- bvg.gc.ca/internet/English/pet_lpf_e_38167.html')
page <- html(url)
paras <- html_text(html_nodes(page, xpath='//p'))
text <- paste(paras, collapse =' ')
getdata <- function(url){
page <- html(url)
title <- html_text(html_node(page, xpath='//h1'))
# The following code is just a copy-paste of a code someone gave me.
list(title=tit,
date=parse_date_time(date, "%B %d, %Y"),
text=paste(text, collapse=' '))
}
index <- html(paste0(BASE, "index.html"))
links <- html_nodes(index, xpath='//ul/li/a')
texts <- c()
authors <- c()
dates <- c()
for (s in slinks){
page <- paste0(BASE, s)
cat('.') ## progress
d <- getdata(page)
texts <- append(texts, d$text)
authors <- append(authors, d$author)
dates <- append(dates, d$date)
}

library(XML)
library(rvest)
#please use this code only if the website allows you to scrap
#get all HTML links on the home page related to online petition
kk<-getHTMLLinks("http://www.oag-bvg.gc.ca/internet/English/pet_lp_e_940.html")
#iterate over each title petition with the pattern pet_lpf_e and get all associated petitions under that title
dd<-lapply(grep("pet_lpf_e",kk,value=TRUE),function(x){
paste0("http://www.oag-bvg.gc.ca",x) %>%
getHTMLLinks
})
#get all the weblinks
ee<-do.call(rbind,lapply(dd,function(x)grep("pet_[0-9]{3}_e",x,value=TRUE)))
#iterate over ff and get the details for each petition
ff<-lapply(ee,function(y){
paste0("http://www.oag-bvg.gc.ca",y) %>%
html%>%
html_nodes(c("p","h1"))%>% #h1 is title and p is paragraph
html_text() %>%
.[1:7] %>%
cbind(.,link=paste0("http://www.oag-bvg.gc.ca",y))
})
e.g.,
> ee[[1]]
[1,] "Federal role and action in response to the Obed Mountain Mine coal slurry spill into the Athabasca River watershed"
[2,] "Petition: 362 "
[3,] "Issue(s): Aboriginal affairs, compliance and enforcement, human/environmental health, toxic substances, water"
[4,] "Petitioner(s): Keepers of the Athabasca Watershed Society and Ecojustice"
[5,] "Date Received: 24 March 2014"
[6,] "Status: Completed"
[7,] "Summary: The petition raises concerns about the federal government’s role and actions in response to the October 2013 Obed Mountain Mine coal slurry spill into the Athabasca River watershed. The petition summarizes the events surrounding the spill, and includes information about the toxic substances that may have been contained in the slurry, such as polycyclic aromatic hydrocarbons, arsenic, cadmium, lead, and mercury. According to the petition, about 670 million litres of slurry were released into the environment; the spill had an impact on fish habitat in nearby streams; and the plume may have travelled far downstream and had a potential impact on municipal drinking water. The petitioners ask the government about its approvals and inspections prior to the spill, as well as its response to the spill, including investigations, future monitoring, and habitat remediation. "
link
[1,] "http://www.oag-bvg.gc.ca/internet/English/pet_362_e_39682.html"
[2,] "http://www.oag-bvg.gc.ca/internet/English/pet_362_e_39682.html"
[3,] "http://www.oag-bvg.gc.ca/internet/English/pet_362_e_39682.html"
[4,] "http://www.oag-bvg.gc.ca/internet/English/pet_362_e_39682.html"
[5,] "http://www.oag-bvg.gc.ca/internet/English/pet_362_e_39682.html"
[6,] "http://www.oag-bvg.gc.ca/internet/English/pet_362_e_39682.html"
[7,] "http://www.oag-bvg.gc.ca/internet/English/pet_362_e_39682.html"

Related

Extract all text & tags between two heading tags (<h3>) with rvest

This page shows six sections listing people between <h3> tags.
How can I use XPath to select these six sections separately (using rvest), perhaps into a nested list? My goal is to later lapply through these six sections to fetch the people's names and affiliations (separated by section).
The HTML isn't so well-structured, i.e. not every text is located within specific tags. An example:
<h3>Editor-in-Chief</h3>
Claudio Ronco – <i>St. Bartolo Hospital</i>, Vicenza, Italy<br />
<br />
<h3>Clinical Engineering</h3>
William R. Clark – <i>Purdue University</i>, West Lafayette, IN, USA<br />
Hideyuki Kawanashi – <i>Tsuchiya General Hospital</i>, Hiroshima, Japan<br />
I access the site with the following code:
journal_url <- "https://www.karger.com/Journal/EditorialBoard/223997"
webpage <- rvest::html_session(journal_url,
httr::user_agent("Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.20 (KHTML, like Gecko) Chrome/11.0.672.2 Safari/534.20"))
webpage <- rvest::html_nodes(webpage, css = '#editorialboard')
I tried various XPaths to extract the six sections with html_nodes into a nested list of six lists, but none of them work properly:
# this gives me a list of 190 (instead of 6) elements, leaving out the text between <i> and </i>
webpage <- rvest::html_nodes(webpage, xpath = '//text()[preceding-sibling::h3 and following-sibling::h3]')
# this gives me a list of 190 (instead of 6) elements, leaving out text that are not between tags
webpage <- rvest::html_nodes(webpage, xpath = '//*[preceding-sibling::h3 and following-sibling::h3]')
# error "VECTOR_ELT() can only be applied to a 'list', not a 'logical'"
webpage <- rvest::html_nodes(webpage, xpath = '//* and text()[preceding-sibling::h3 and following-sibling::h3]')
# this gives me a list of 274 (instead of 6) elements
webpage <- rvest::html_nodes(webpage, xpath = '//text()[preceding-sibling::h3]')

Are you ok with an ugly solution that does not use XPath? I don't think you can get a nested list from the structure of this website... But I am not very experienced in xpath.
I first got the headings, divided the raw text using the heading names and then, within each group, divided the members using '\n' as a separator.
journal_url <- "https://www.karger.com/Journal/EditorialBoard/223997"
webpage <- read_html(journal_url) %>% html_node(css = '#editorialboard')
# get h3 headings
headings <- webpage %>% html_nodes('h3') %>% html_text()
# get raw text
raw.text <- webpage %>% html_text()
# split raw text on h3 headings and put in a list
list.members <- list()
raw.text.2 <- raw.text
for (h in headings) {
# split on headings
b <- strsplit(raw.text.2, h, fixed=TRUE)
# split members using \n as separator
c <- strsplit(b[[1]][1], '\n', fixed=TRUE)
# clean empty elements from vector
c <- list(c[[1]][c[[1]] != ""])
# add vector of member to list
list.members <- c(list.members, c)
# update text
raw.text.2 <- b[[1]][2]
}
# remove first element of main list
list.members <- list.members[2:length(list.members)]
# add final segment of raw.text to list
c <- strsplit(raw.text.2, '\n', fixed=TRUE)
c <- list(c[[1]][c[[1]] != ""])
list.members <- c(list.members, c)
# add names to list
names(list.members) <- headings
Then you get a list of the groups and each element of the list is a vector with strings for each member (using all info)
> list.members$`Editor-in-Chief`
[1] "Claudio Ronco – St. Bartolo Hospital, Vicenza, Italy"
> list.members$`Clinical Engineering`
[1] "William R. Clark – Purdue University, West Lafayette, IN, USA"
[2] "Hideyuki Kawanashi – Tsuchiya General Hospital, Hiroshima, Japan"
[3] "Tadayuki Kawasaki – Mobara Clinic, Mobara City, Japan"
[4] "Jeongchul Kim – Wake Forest School of Medicine, Winston-Salem, NC, USA"
[5] "Anna Lorenzin – International Renal Research Institute of Vicenza, Vicenza, Italy"
[6] "Ikuto Masakane – Honcho Yabuki Clinic, Yamagata City, Japan"
[7] "Michio Mineshima – Tokyo Women's Medical University, Tokyo, Japan"
[8] "Tomotaka Naramura – Kurashiki University of Science and the Arts, Kurashiki, Japan"
[9] "Mauro Neri – International Renal Research Institute of Vicenza, Vicenza, Italy"
[10] "Masanori Shibata – Koujukai Rehabilitation Hospital, Nagoya City, Japan"
[11] "Yoshihiro Tange – Kyushu University of Health and Welfare, Nobeoka-City, Japan"
[12] "Yoshiaki Takemoto – Osaka City University, Osaka City, Japan"

Creating a table with scraped CSV data in R

I have the following name_total = matrix(nrow = 51, ncol=3, NA), where each row corresponds to a state (51 being District of Columbia). The first column is a string giving the name of the state (for example: name_total[1,1]= "Alabama").
The second and third are urls of CSV files from the Census, respectively linking counties with the state senate districts, and counties with state house districts.
For Alabama:
name_total[1,2] ="http://www2.census.gov/geo/relfiles/cdsld13/01/co_lu_delim_01.txt"
name_total[1,3] ="http://www2.census.gov/geo/relfiles/cdsld13/01/co_ll_delim_01.txt"
I wish to get as a final output a table which would basically be all 50 states + DC with their respective counties and linked Senate and House districts. I don't know if that's very clear so here is an example:
[,1] [,2] [,3] [,4]
[1,] "Alabama" "countyX1" "Senate District Y1" "House District Z1"
[2,] "Alabama" "countyX2" "Senate District Y2" "House District Z2"
[3,] "Alabama" "countyX3" "Senate District Y3" "House District Z3"
[4,] "Alaska" "countyX4" "Senate District Y4" "House District Z4"
[5,] "Alaska" "countyX5" "Senate District Y4" "House District Z5"
I use a forloop:
for (i in 1:51){
senate= name_total[i,2]
link_senate = url(senate)
house= name_total[i,3]
link_house = url(house)
state=name_total[i,1]
data_senate= read.csv2(link_senate,sep=",",header=TRUE, skip=1)
data_house= read.csv2(link_house,sep=",",header=TRUE, skip=1)
final=cbind(state, data_senate, data_house)
}
Of course each element has a different number of rows, for Alabama (i=1) State returns "Alabama" once, the others returning respectively 3 by 122 and 3 by 207 matrices. I get an error message about these variations in the number of rows.
I'm pretty sure one of the issues is the use of the cbind function, but I do not know what to use to get a better result.

In case others have similar issues, I found a way to get what I wanted separately for State Senates and State Houses. First of all some of the States only have of the two, and the link for Oregon was down. Personally I took them out of my original data.
Then I initialized for the first state outside of the loop:
senate = url(name_total[1,2])
data_senate= read.csv2(senate,sep=",",header=TRUE, skip=1)
assign(paste("Base_senate_",name_total[1,1],sep=""),data_senate)
A = assign(paste("Base_senate_",name_total[1,1],sep=""),data_senate)
house= url(name_total[1,3])
data_house= read.csv2(house,sep=",",header=TRUE, skip=1)
assign(paste("Base_house_",name_total[1,1],sep=""),data_house)
B = assign(paste("Base_house_",name_total[1,1],sep=""),data_house)
and then I used for loop:
for (i in 2:48){
senate = url(name_total[i,2])
house= url(name_total[i,3])
data_senate= read.csv2(senate,sep=",",header=TRUE, skip=1)
assign(paste("Base_senate_",name_total[i,1],sep=""),data_senate)
names(data_senate)[2] = "County"
A = rbind(A,assign(paste("Base_senate_",name_total[i,1],sep=""),data_senate))
data_house= read.csv2(house,sep=",",header=TRUE, skip=1)
assign(paste("Base_house_",name_total[i,1],sep=""),data_house)
names(data_house)[2] = "County"
B = rbind(B,assign(paste("Base_house_",name_total[i,1],sep=""),data_house))
}
A and B give you the expected tables (without the string name of the State, but the first variable identifies the state).
I had to use the names(data_senate)[2] = "County" because the second column had a different name for some states.
Hope it helps!

Document term matrix in R

I have the following code:
rm(list=ls(all=TRUE)) #clear data
setwd("~/UCSB/14 Win 15/Issy/text.fwt") #set working directory
files <- list.files(); head(files) #load & check working directory
fw1 <- scan(what="c", sep="\n",file="fw_chp01.fwt")
library(tm)
corpus2<-Corpus(VectorSource(c(fw1)))
skipWords<-(function(x) removeWords(x, stopwords("english")))
#remove punc, numbers, stopwords, etc
funcs<-list(content_transformer(tolower), removePunctuation, removeNumbers, stripWhitespace, skipWords)
corpus2.proc<-tm_map(corpus2, FUN = tm_reduce, tmFuns = funcs)
corpus2a.dtm <- DocumentTermMatrix(corpus2.proc, control = list(wordLengths = c(1,110))) #create document term matrix
I'm trying use some of the operations detailed in the tm reference manual (http://cran.r-project.org/web/packages/tm/tm.pdf) with little success. For example, when I try to use the findFreqTerms, I get the following error:
Error: inherits(x, c("DocumentTermMatrix", "TermDocumentMatrix")) is not TRUE
Can anyone clue me in as to why this isn't working and what I can do to fix it?
Edited for #lawyeR:
head(fw1) produces the first six lines of the text (Episode 1 of Finnegans Wake by James Joyce):
[1] "003.01 riverrun, past Eve and Adam's, from swerve of shore to bend"
[2] "003.02 of bay, brings us by a commodius vicus of recirculation back to"
[3] "003.03 Howth Castle and Environs."
[4] "003.04 Sir Tristram, violer d'amores, fr'over the short sea, had passen-"
[5] "003.05 core rearrived from North Armorica on this side the scraggy"
[6] "003.06 isthmus of Europe Minor to wielderfight his penisolate war: nor"
inspect(corpus2) outputs each line of the text in the following format (this is the final line of the text):
[[960]]
<<PlainTextDocument (metadata: 7)>>
029.36 borough. #this part differs by line of course
inspect(corpus2a.dtm) returns a table of all the types (there are 4163 in total( in the text in the following format:
Docs youths yoxen yu yurap yutah zee zephiroth zine zingzang zmorde zoom
1 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0

Here is a simplified form of what you provided and did, and tm does its job. It may be that one or more of your cleaning steps caused a problem.
> library(tm)
> fw1 <- c("riverrun, past Eve and Adam's, from swerve of shore to bend
+ of bay, brings us by a commodius vicus of recirculation back to
+ Howth Castle and Environs.
+ Sir Tristram, violer d'amores, fr'over the short sea, had passen-
+ core rearrived from North Armorica on this side the scraggy
+ isthmus of Europe Minor to wielderfight his penisolate war: nor")
>
> corpus<-Corpus(VectorSource(c(fw1)))
> inspect(corpus)
<<VCorpus (documents: 1, metadata (corpus/indexed): 0/0)>>
[[1]]
<<PlainTextDocument (metadata: 7)>>
riverrun, past Eve and Adam's, from swerve of shore to bend
of bay, brings us by a commodius vicus of recirculation back to
Howth Castle and Environs.
Sir Tristram, violer d'amores, fr'over the short sea, had passen-
core rearrived from North Armorica on this side the scraggy
isthmus of Europe Minor to wielderfight his penisolate war: nor
> dtm <- DocumentTermMatrix(corpus)
> findFreqTerms(dtm)
[1] "adam's," "and" "armorica" "back" "bay," "bend"
[7] "brings" "castle" "commodius" "core" "d'amores," "environs."
[13] "europe" "eve" "fr'over" "from" "had" "his"
[19] "howth" "isthmus" "minor" "nor" "north" "passen-"
[25] "past" "penisolate" "rearrived" "recirculation" "riverrun," "scraggy"
[31] "sea," "shore" "short" "side" "sir" "swerve"
[37] "the" "this" "tristram," "vicus" "violer" "war:"
[43] "wielderfight"
As another point, I find it useful at the start to load a few other complementary packages to tm.
library(SnowballC); library(RWeka); library(rJava); library(RWekajars)
For what its worth, as compared to your somewhat complicated cleaning steps, I usually trudge along like this (replace comments$comment with your text vector):
comments$comment <- tolower(comments$comment)
comments$comment <- removeNumbers(comments$comment)
comments$comment <- stripWhitespace(comments$comment)
comments$comment <- str_replace_all(comments$comment, " ", " ")
# replace all double spaces internally with single space
# better to remove punctuation with str_ because the tm function doesn't insert a space
library(stringr)
comments$comment <- str_replace_all(comments$comment, pattern = "[[:punct:]]", " ")
comments$comment <- removeWords(comments$comment, stopwords(kind = "english"))

From another ticket this should help tm 0.6.0 has a bug and it can be addressed with this statement.
corpus_clean <- tm_map( corp_stemmed, PlainTextDocument)
Hope this helps.

Select by Date or Sort by date on GoogleNewsSource R

I am using the R package tm.plugin.webmining. Using the function GoogleNewsSource() I would like to query the news sorted by date and also from a specific date. Is there any paremeter to query the news of a specific date?
library(tm)
library(tm.plugin.webmining)
searchTerm <- "Data Mining"
corpusGoog <- WebCorpus(GoogleNewsSource(params=list(hl="en", q=searchTerm,
ie="utf-8", num=10, output="rss" )))
headers <- meta(corpusGoog,tag="datetimestamp")

If you're looking for a data frame-like structure, this is how you'd go about creating it (note: not all fields are extracted from the corpus):
library(dplyr)
make_row <- function(elem) {
data.frame(timestamp=elem[[2]]$datetimestamp,
heading=elem[[2]]$heading,
description=elem[[2]]$description,
content=elem$content,
stringsAsFactors=FALSE)
}
dat <- bind_rows(lapply(corpusGoog, make_row))
str(dat)
## Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 10 obs. of 4 variables:
## $ timestamp : POSIXct, format: "2015-02-03 13:08:16" "2015-01-11 23:37:45" ...
## $ heading : chr "A guide to data mining with Hadoop - Information Age" "Barack Obama to seek limits on student data mining - Politico" "Is data mining riddled with risk or a natural hazard of the internet? - INTHEBLACK" "Why an obscure British data-mining company is worth $3 billion - Quartz" ...
## $ description: chr "Information AgeA guide to data mining with HadoopInformation AgeWith the advent of the Internet of Things and the transition fr"| __truncated__ "PoliticoBarack Obama to seek limits on student data miningPoliticoPresident Barack Obama on Monday is expected to call for toug"| __truncated__ "INTHEBLACKIs data mining riddled with risk or a natural hazard of the internet?INTHEBLACKData mining is now viewed as a serious"| __truncated__ "QuartzWhy an obscure British data-mining company is worth $3 billionQuartzTesco, the troubled British retail group, is starting"| __truncated__ ...
## $ content : chr "A guide to data mining with Hadoop\nHow businesses can realise and capitalise on the opportunities that Hadoop offers\nPosted b"| __truncated__ "By Stephanie Simon\n1/11/15 6:32 PM EST\nPresident Barack Obama on Monday is expected to call for tough legislation to protect "| __truncated__ "By Adam Courtenay\nData mining is now viewed as a serious security threat, but with all the hype, s"| __truncated__ "How We Buy\nJanuary 12, 2015\nTesco, the troubled British retail group, is starting over. After an accounting scandal , a serie"| __truncated__ ...
Then, you can do anything you want. For example:
dat %>%
arrange(timestamp) %>%
select(heading) %>%
head
## Source: local data frame [6 x 1]
##
## heading
## 1 The potential of fighting corruption through data mining - Transparency International (pre
## 2 Barack Obama to seek limits on student data mining - Politico
## 3 Why an obscure British data-mining company is worth $3 billion - Quartz
## 4 Parks and Rec Recap: Treat Yo Self to Some Data Mining - Indianapolis Monthly
## 5 Fraud and data mining in Vancouverâ\u0080¦just Outside the Lines - Vancouver Sun (blog)
## 6 'Parks and Rec' Data-Mining Episode Was Eerily True To Life - MediaPost Communications
If you want/need something else, you need to be clearer in your question.

I was looking at google query string and noticed they pass startdate and enddate tag in the query if you click dates on right hand side of the page.
You can use the same tag name and yout results will be confined within start and end date.
GoogleFinanceSource(query, params = list(hl = "en", q = query, ie = "utf-8",
start = 0, num = 25, output = "rss",
startdate='2015-10-26', enddate = '2015-10-28'))

Quantmod FRED Metadata in R

library(quantmod)
getSymbols("GDPC1",src = "FRED")
I am trying to extract the numerical economic/financial data in FRED but also the metadata. I am trying to chart CPI and have the meta data as a labels/footnotes. Is there a way to extract this data using the quantmod package?
Title: Real Gross Domestic Product
Series ID: GDPC1
Source: U.S. Department of Commerce: Bureau of Economic Analysis
Release: Gross Domestic Product
Seasonal Adjustment: Seasonally Adjusted Annual Rate
Frequency: Quarterly
Units: Billions of Chained 2009 Dollars
Date Range: 1947-01-01 to 2014-01-01
Last Updated: 2014-06-25 7:51 AM CDT
Notes: BEA Account Code: A191RX1
Real gross domestic product is the inflation adjusted value of the
goods and services produced by labor and property located in the
United States.
For more information see the Guide to the National Income and Product
Accounts of the United States (NIPA) -
(http://www.bea.gov/national/pdf/nipaguid.pdf)

You can use the same code that's in the body of getSymbools.FRED, but change ".csv" to ".xls", then read the metadata you're interested in from the .xls file.
library(gdata)
Symbol <- "GDPC1"
FRED.URL <- "http://research.stlouisfed.org/fred2/series"
tmp <- tempfile()
download.file(paste0(FRED.URL, "/", Symbol, "/downloaddata/", Symbol, ".xls"),
destfile=tmp)
read.xls(tmp, nrows=17, header=FALSE)
# V1 V2
# 1 Title: Real Gross Domestic Product
# 2 Series ID: GDPC1
# 3 Source: U.S. Department of Commerce: Bureau of Economic Analysis
# 4 Release: Gross Domestic Product
# 5 Seasonal Adjustment: Seasonally Adjusted Annual Rate
# 6 Frequency: Quarterly
# 7 Units: Billions of Chained 2009 Dollars
# 8 Date Range: 1947-01-01 to 2014-01-01
# 9 Last Updated: 2014-06-25 7:51 AM CDT
# 10 Notes: BEA Account Code: A191RX1
# 11 Real gross domestic product is the inflation adjusted value of the
# 12 goods and services produced by labor and property located in the
# 13 United States.
# 14
# 15 For more information see the Guide to the National Income and Product
# 16 Accounts of the United States (NIPA) -
# 17 (http://www.bea.gov/national/pdf/nipaguid.pdf)
Instead of hardcoding nrows=17, you can use grep to search for the row that has the headers of the data, and subset to only include rows before that.
dat <- read.xls(tmp, header=FALSE, stringsAsFactors=FALSE)
dat[seq_len(grep("DATE", dat[, 1])-1),]
unlink(tmp) # remove the temp file when you're done with it.

FRED has a straightforward, well-document json interface http://api.stlouisfed.org/docs/fred/ which provides both metadata and time series data for all of its economic series. Access requires a FRED account and api key but these are available on request from http://api.stlouisfed.org/api_key.html .
The excel descriptive data you asked for can be retrieved using
get.FRSeriesTags <- function(seriesNam)
{
# seriesNam = character string containing the ID identifying the FRED series to be retrieved
#
library("httr")
library("jsonlite")
# dummy FRED api key; request valid key from http://api.stlouisfed.org/api_key.html
apiKey <- "&api_key=abcdefghijklmnopqrstuvwxyz123456"
base <- "http://api.stlouisfed.org/fred/"
seriesID <- paste("series_id=", seriesNam,sep="")
fileType <- "&file_type=json"
#
# get series descriptive data
#
datType <- "series?"
url <- paste(base, datType, seriesID, apiKey, fileType, sep="")
series <- fromJSON(url)$seriess
#
# get series tag data
#
datType <- "series/tags?"
url <- paste(base, datType, seriesID, apiKey, fileType, sep="")
tags <- fromJSON(url)$tags
#
# format as excel descriptive rows
#
description <- data.frame(Title=series$title[1],
Series_ID = series$id[1],
Source = tags$notes[tags$group_id=="src"][1],
Release = tags$notes[tags$group_id=="gen"][1],
Frequency = series$frequency[1],
Units = series$units[1],
Date_Range = paste(series[1, c("observation_start","observation_end")], collapse=" to "),
Last_Updated = series$last_updated[1],
Notes = series$notes[1],
row.names=series$id[1])
return(t(description))
}
Retrieving the actual time series data would be done in a similar way. There are several json packages available for R but jsonlite works particularly well for this application.
There's a bit more to setting this up than the previous answer but perhaps worth it if you do much with FRED data.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Scraping a website for governmental information with R - r

Related

Extract all text & tags between two heading tags (<h3>) with rvest

Creating a table with scraped CSV data in R

Document term matrix in R

Select by Date or Sort by date on GoogleNewsSource R

Quantmod FRED Metadata in R

Categories

Resources