rvest: select an option and submit form - r

I am trying to extract the unemployment rate data from this site. In the form, there is a select tag with some options. I can extract the table from default year 2007 to 2017. But I am having a hard time to set a value for from_year and to_year. Here is the code I have so far:
session = html_session("https://data.bls.gov/timeseries/LNS14000000")
form = read_html("https://data.bls.gov/timeseries/LNS14000000") %>% html_node("table form") %>% html_form()
set_values(form, from_year = 2000, to_year = as.numeric(format(Sys.Date(), "%Y"))) # nothing happened if I set the value for years
submit_form(session, form)
It doesn't work as expected.

Thanks so much #Andrew!
I can use the api to extract the data.
library(rjson)
library(blsAPI)
uer1 <- list(
'seriesid'=c('LNS14000000'),
'startyear'=2000,
'endyear'=2009)
response <- blsAPI(uer1, 2, TRUE)
The response looks like:
year period periodName value seriesID
1 2009 M12 December 9.9 LNS14000000
2 2009 M11 November 9.9 LNS14000000
3 2009 M10 October 10.0 LNS14000000
4 2009 M09 September 9.8 LNS14000000
5 2009 M08 August 9.6 LNS14000000
6 2009 M07 July 9.5 LNS14000000
...
Note that there are some query limits in the api.
api limits

Related

select the data by month and years in R

I have a data frame ordered by month and year. I want to select only the integer number of years i.e. if the data start in July 2002 and ends in September 2010 then select only data from July 2002 to June 2010.
And if the data starts in September 1992 and ends in March 2000 then select only data from September 1992 to August 1999. Regardless of the missing months in between.
The data can be uploaded from the following link:
enter link description here
The code
mydata <- read.csv("E:/mydata.csv", stringsAsFactors=TRUE)
this is manually selection
selected.data <- mydata[1:73,] # July 2002 to June 2010
how to achieve that by coding.
Here is a base solution, that reproduce your manual subsetting:
mydata <- read.csv("D:/mydata.csv", stringsAsFactors=F)
lookup <-
c(
January = 1,
February = 2,
March = 4,
April = 4,
May = 5,
June = 6,
July = 7,
August = 8,
September = 9,
October = 10,
November = 11,
December = 12
)
mydata$Month <- unlist(lapply(mydata$Month, function(x) lookup[match(x, names(lookup))]))
first.month <- mydata$Month[1]
last.year <- max(mydata$Year)
mydata[1:which(mydata$Month==(first.month -1)&mydata$Year==last.year),]
Basically, I convert the Month name in number and find the month preceding the first month that appears in the dataframe, for the last year of the dataframe.
Here's a base R one-liner :
result <- mydata[seq_len(with(mydata, which(Month == month.name[match(Month[1],
month.name) - 1] & Year == max(Year)))), ]
head(result)
# Month Year var
#1 July 2002 -91.22997
#2 October 2002 -91.19007
#3 December 2002 -91.05395
#4 February 2003 -91.16958
#5 March 2003 -91.17881
#6 April 2003 -91.15110
tail(result)
# Month Year var
#68 December 2009 -90.92610
#69 January 2010 -91.07379
#70 February 2010 -91.12460
#71 March 2010 -91.10288
#72 April 2010 -91.06040
#73 June 2010 -90.94212

Aggregating based on previous year and this year

I have these data sets
month Year Rain
10 2010 376.8
11 2010 282.78
12 2010 324.58
1 2011 73.51
2 2011 225.89
3 2011 22.96
I used
df2prnext<-
aggregate(Rain~Year, data = subdataprnext, mean)
but I need the mean value of 217.53.
I am not getting the expected result. Thank you for your help.

How do I lag Quarters in r?

First and foremost - thank you for viewing my question - regardless of if you answer or not.
I am trying to add a column that contains the lagged values of the Quarter value to my DF, however, I get the below warning when I do so:
Warning messages:
1: In mutate_impl(.data, dots) :
Vectorizing 'yearqtr' elements may not preserve their attributes
Below is my sample data (my data starts on 1/3/2018)
Ticker Price Date Quarter
A 10 1/3/18 2018 Q1
A 13.5 2/15/18 2018 Q1
A 12.9 4/2/18 2018 Q2
A 11.2 5/3/18 2018 Q2
B 35.2 1/4/18 2018 Q1
B 33.1 3/2/18 2018 Q1
B 31 4/6/18 2018 Q2
... ... ... ...
XYZ 102 5/6/18 2018 Q2
I have a huge table with multiple stocks and multiple dates. The way I calculate the quarter column is :
df$quarter <- lag(as.yearqtr(df$Date))
But however - I can't get to add a column that would lag the values of the Quarter. Would anyone know a possible workaround?
I would like the below output:
Ticker Price Date Quarter Lag_Q
A 10 1/3/18 2018 Q1 NA
A 13.5 2/15/18 2018 Q1 NA
A 12.9 4/2/18 2018 Q2 2018 Q1
A 11.2 5/3/18 2018 Q2 2018 Q1
B 35.2 1/4/18 2018 Q1 NA
B 33.1 3/2/18 2018 Q1 NA
B 31 4/6/18 2018 Q2 2018 Q1
... ... ... ...
XYZ 102 5/6/18 2018 Q2 2018 Q1
Firstly, I'd suggest organizing your data so that each column represents prices of an individual security and each row is a specific date. From there, you can transform all securities easily, but I'm not sure what your end goal is. The xts package is excellent and has been optimized in c, and is kind of the securities industry standard. I highly suggest exploring it. But that's beyond the scope of your post!
For your data structure though, a single line should do:
df$lag_Q <- as.yearqtr( ifelse(test = (df$quarter=="2018 Q1"),
yes = NA,
no = df$quarter-0.25) )

Convert fromJSON list to a data frame

I am getting data from BLS website using the package blsAPI.
The code is:
library(blsAPI)
employ <- blsAPI(payload= "CES0500000001")
emp <- fromJSON(employ)
The data set emp is a list... this is where I am stumped. I've been trying all types of variations to convert emp to data.frame from list with no success.
Just set the argument return_data_frame = TRUE of blsAPI function. data.frame will be returned instead of list (default behaviour).
library(rjson)
library(blsAPI)
response <- blsAPI("CES0500000001", return_data_frame = TRUE)
head(response)
Output:
year period periodName value seriesID
1 2018 M08 August 126939 CES0500000001
2 2018 M07 July 126735 CES0500000001
3 2018 M06 June 126582 CES0500000001
4 2018 M05 May 126390 CES0500000001
5 2018 M04 April 126130 CES0500000001
6 2018 M03 March 125956 CES0500000001

R - split data to hydrological quarters

I wish to split my data sets into year quarters according to definition of hydrological year. According to Wikipedia, "Due to meteorological and geographical factors, the definition of the water years varies". In USA, hydrological year is a period between October 1st of one year and September 30th of the next.
I use definition of hydrological year for Poland (starts at November 1st and ends at October 31st).
Sample data set looks as folllows:
sampleData <- structure(list(date = structure(c(15946, 15947, 15875, 15910, 15869, 15888, 15823, 16059, 16068, 16067), class = "Date"),`example value` = c(-0.325806595888448, 0.116001346459147, 1.68884381116696, -0.480527505762716, -0.50307381813168,-1.12032214801472, -0.659699514672226, -0.547101497279717, 0.729148872679021,-0.769760735764215)), .Names = c("date", "example value"), row.names = c(NA, -10L), class = "data.frame")
For some reason, function "cut" in my code complains that "breaks" and "labels" differs in length (but they don't). If I omit "labels" options in cut (as below) function works perfectly.
What is wrong with labels?
ToHydroQuarters <-function(df)
{
result <- df
yearStart <- as.numeric(format(min(df$date),'%Y'))-1
#Hydrological year in Poland starts at November 1st
DateStart <- as.Date(paste(yearStart,"-11-01",sep=""))
breaks <- seq(from=DateStart, to=max(df$date)+90, by="quarter")
breakYear <- format(breaks,'%Y')
#Please, do not create labels in such way.
#Please note that for November and December we have next hydrological year - since it started at 1st November. So, we need to check month to decide which year we have (?) or use cut function again as mentioned here: http://stackoverflow.com/questions/22073881/hydrological-year-time-series
labels <- c(paste("Winter",breakYear[1]),
paste("Spring",breakYear[2]),
paste("Summer",breakYear[3]),
paste("Autumn",breakYear[4]),
paste("Autumn",breakYear[5]))
######Here is problem - once I add labels parameter, function complains about different lengths
result$hydroYear <- cut(df$date, breaks)
result
}
Firstly I think it is unwise to have labels as a "hardcoded" variable in a function since it is impossible to check without some kind of reproducible example, however I can see what you're trying to achieve.
You claim that your break and labels should be the correct length, however the function itself doesn't always work (this is without the labels, even if the labels did exist the cut function did not process the last portion of the dates).
For example:
library(lubridate)
x <- ymd(c("09-01-01", "09-01-02", "11-09-03"))
df <- data.frame(date=as.Date(seq(from=min(x), to=max(x), by="day")))
a <- ToHydroQuarters(df)
tail(a)
returns:
date hydroYear
971 2011-08-29 <NA>
972 2011-08-30 <NA>
973 2011-08-31 <NA>
974 2011-09-01 <NA>
975 2011-09-02 <NA>
976 2011-09-03 <NA>
Doing something like breaks <- seq(from=DateStart, to=max(df$date)+90, by="quarter"), does resolve that issue, as it forces a break to actually exist. This might solve your labelling issue that you've had in your function, but it does not make the function "generic".
Personally on the coding side I think it would be better to convert the month, and year parts separately, because it would be easier to understand. For example, you could use library(lubridate) to easily extract the month and specify the breaks and the labels as you normally would. I was thinking the function could look something like this:
thq <- function(date) {
mnth <- cut(month(date), breaks=c(1,4,7, 10, 12),
right=FALSE, include.lowest=TRUE,
labels=c("Spring", "Summer", "Autumn", "Winter"))
return(paste(mnth, ifelse(mnth == "Winter", year(date)+1, year(date))))
}
So then using some dummy data ...
library(lubridate)
x <- ymd(c("09-01-01", "09-01-02", "11-09-03"))
df <- data.frame(date=as.Date(seq(from=min(x), to=max(x), by="month")))
thq <- function(date) {
mnth <- cut(month(date), breaks=c(1,4,7, 10, 12),
right=FALSE, include.lowest=TRUE,
labels=c("Spring", "Summer", "Autumn", "Winter"))
return(paste(mnth, ifelse(mnth == "Winter", year(date)+1, year(date))))
}
df$newdate <- thq(df$date)
Which has the following output:
date newdate
1 2009-01-01 Spring 2009
2 2009-02-01 Spring 2009
3 2009-03-01 Spring 2009
4 2009-04-01 Summer 2009
5 2009-05-01 Summer 2009
6 2009-06-01 Summer 2009
7 2009-07-01 Autumn 2009
8 2009-08-01 Autumn 2009
9 2009-09-01 Autumn 2009
10 2009-10-01 Winter 2010
11 2009-11-01 Winter 2010
12 2009-12-01 Winter 2010
13 2010-01-01 Spring 2010
14 2010-02-01 Spring 2010
15 2010-03-01 Spring 2010
16 2010-04-01 Summer 2010
17 2010-05-01 Summer 2010
18 2010-06-01 Summer 2010
19 2010-07-01 Autumn 2010
20 2010-08-01 Autumn 2010
21 2010-09-01 Autumn 2010
22 2010-10-01 Winter 2011
23 2010-11-01 Winter 2011
24 2010-12-01 Winter 2011
25 2011-01-01 Spring 2011
26 2011-02-01 Spring 2011
27 2011-03-01 Spring 2011
28 2011-04-01 Summer 2011
29 2011-05-01 Summer 2011
30 2011-06-01 Summer 2011
31 2011-07-01 Autumn 2011
32 2011-08-01 Autumn 2011
33 2011-09-01 Autumn 2011
You can shift the months using the modulo operator if it is in a weird order...
thq <- function(date) {
mnth <- cut(((month(df$date)+1) %% 12), breaks=c(0, 3, 6, 9, 12),
right=FALSE, include.lowest=TRUE,
labels=c("Nov_Jan", "Feb_Apr", "May_Jul", "Aug_Oct")
)
# you will need to alter the return statement yourself, because
# I feel there is enough information for you to do it, rather than
# me changing it every time you change the question.
return(paste(mnth, ifelse(mnth == "Winter", year(date)+1, year(date))))
}
library(lubridate)
x <- ymd(c("09-01-01", "09-01-02", "11-09-03"))
df <- data.frame(date=as.Date(seq(from=min(x), to=max(x), by="day")))
df$new <- thq(df$date)
head(df)
output:
> head(df)
date new
1 2009-01-01 Nov_Jan 2009
2 2009-01-02 Nov_Jan 2009
3 2009-01-03 Nov_Jan 2009
4 2009-01-04 Nov_Jan 2009
5 2009-01-05 Nov_Jan 2009
6 2009-01-06 Nov_Jan 2009

Resources