Convert fromJSON list to a data frame - r

I am getting data from BLS website using the package blsAPI.
The code is:
library(blsAPI)
employ <- blsAPI(payload= "CES0500000001")
emp <- fromJSON(employ)
The data set emp is a list... this is where I am stumped. I've been trying all types of variations to convert emp to data.frame from list with no success.

Just set the argument return_data_frame = TRUE of blsAPI function. data.frame will be returned instead of list (default behaviour).
library(rjson)
library(blsAPI)
response <- blsAPI("CES0500000001", return_data_frame = TRUE)
head(response)
Output:
year period periodName value seriesID
1 2018 M08 August 126939 CES0500000001
2 2018 M07 July 126735 CES0500000001
3 2018 M06 June 126582 CES0500000001
4 2018 M05 May 126390 CES0500000001
5 2018 M04 April 126130 CES0500000001
6 2018 M03 March 125956 CES0500000001

Related

select the data by month and years in R

I have a data frame ordered by month and year. I want to select only the integer number of years i.e. if the data start in July 2002 and ends in September 2010 then select only data from July 2002 to June 2010.
And if the data starts in September 1992 and ends in March 2000 then select only data from September 1992 to August 1999. Regardless of the missing months in between.
The data can be uploaded from the following link:
enter link description here
The code
mydata <- read.csv("E:/mydata.csv", stringsAsFactors=TRUE)
this is manually selection
selected.data <- mydata[1:73,] # July 2002 to June 2010
how to achieve that by coding.
Here is a base solution, that reproduce your manual subsetting:
mydata <- read.csv("D:/mydata.csv", stringsAsFactors=F)
lookup <-
c(
January = 1,
February = 2,
March = 4,
April = 4,
May = 5,
June = 6,
July = 7,
August = 8,
September = 9,
October = 10,
November = 11,
December = 12
)
mydata$Month <- unlist(lapply(mydata$Month, function(x) lookup[match(x, names(lookup))]))
first.month <- mydata$Month[1]
last.year <- max(mydata$Year)
mydata[1:which(mydata$Month==(first.month -1)&mydata$Year==last.year),]
Basically, I convert the Month name in number and find the month preceding the first month that appears in the dataframe, for the last year of the dataframe.
Here's a base R one-liner :
result <- mydata[seq_len(with(mydata, which(Month == month.name[match(Month[1],
month.name) - 1] & Year == max(Year)))), ]
head(result)
# Month Year var
#1 July 2002 -91.22997
#2 October 2002 -91.19007
#3 December 2002 -91.05395
#4 February 2003 -91.16958
#5 March 2003 -91.17881
#6 April 2003 -91.15110
tail(result)
# Month Year var
#68 December 2009 -90.92610
#69 January 2010 -91.07379
#70 February 2010 -91.12460
#71 March 2010 -91.10288
#72 April 2010 -91.06040
#73 June 2010 -90.94212

Updating a numeric column with characters in R

I have a column like this of the Data data.frame:
Month
3
6
9
3
6
9
3
6
9
...
I want to update 3 with March, 6 with Jume, 9 with September. I know how to do it if I have two months 3 and 10 for example with: mutate(Data, Month=if_else(Month==3,"March","October")) How can I do it for three months?
Expected output:
Month
March
June
September
March
June
September
March
June
September
...
You could just use your numerical month values to access month.name, which is R's built-in vector of month names, starting at index 1:
Data <- data.frame(Month=c(3,6,9))
Data$MonthName <- month.name[Data$Month]
Data
Month MonthName
1 3 March
2 6 June
3 9 September

Combining unequal data frames and applying a calculation

I've been doing some data cleaning and regressions but now I would like to apply the output however, I'm stuck on the following problem.
One data frame called "Historical" and looks like this:
Year Value
2014 5
2015 7.5
2016 11
The other data frame is called "forecast" and looks like this (new years in the future):
Year Growth
2017 0.05
2018 0.11
etc
So I would like to have one data frame to show historical values and forecasted values starting in 2017 (11*1.05)
How can I go about this?
Much appreciated
Given
a <- read.table(header=T, text="Year Value
2014 5
2015 7.5
2016 11")
b <- read.table(header=T, text="
Year Growth
2017 0.05
2018 0.11")
You could e.g. do
rbind(a, cbind(
Year=b$Year,
Value=cumprod(c(tail(a$Value, 1), 1+b$Growth))[-1])
)
# Year Value
# 1 2014 5.0000
# 2 2015 7.5000
# 3 2016 11.0000
# 4 2017 11.5500
# 5 2018 12.8205

rvest: select an option and submit form

I am trying to extract the unemployment rate data from this site. In the form, there is a select tag with some options. I can extract the table from default year 2007 to 2017. But I am having a hard time to set a value for from_year and to_year. Here is the code I have so far:
session = html_session("https://data.bls.gov/timeseries/LNS14000000")
form = read_html("https://data.bls.gov/timeseries/LNS14000000") %>% html_node("table form") %>% html_form()
set_values(form, from_year = 2000, to_year = as.numeric(format(Sys.Date(), "%Y"))) # nothing happened if I set the value for years
submit_form(session, form)
It doesn't work as expected.
Thanks so much #Andrew!
I can use the api to extract the data.
library(rjson)
library(blsAPI)
uer1 <- list(
'seriesid'=c('LNS14000000'),
'startyear'=2000,
'endyear'=2009)
response <- blsAPI(uer1, 2, TRUE)
The response looks like:
year period periodName value seriesID
1 2009 M12 December 9.9 LNS14000000
2 2009 M11 November 9.9 LNS14000000
3 2009 M10 October 10.0 LNS14000000
4 2009 M09 September 9.8 LNS14000000
5 2009 M08 August 9.6 LNS14000000
6 2009 M07 July 9.5 LNS14000000
...
Note that there are some query limits in the api.
api limits

Need clarification on the calculation of average polarity score returned by sentiment function of sentimentr(trinker)

I am using sentiment analysis function sentiment_by() from R package sentimentr (by trinker). I have a dataframe containing the following columns:
review comments
month
year
I ran the sentiment_by function on the dataframe to find the average polarity score based on the year and month and i get the following values.
review_year review_month word_count sd ave_sentiment
2015 March 8722 0.381686065 0.163440921
2015 April 7758 0.387046768 0.158812775
2015 May 7333 0.389256472 0.149220636
2015 November 14020 0.394711478 0.14691745
2016 February 7974 0.400406931 0.142345278
2015 September 8238 0.379989344 0.141740366
2015 February 7642 0.361415304 0.141624745
2015 December 24863 0.387409099 0.141606892
2016 March 8229 0.389033232 0.138552943
2016 January 10472 0.388300946 0.134302612
2015 August 7520 0.3640285 0.127980712
2016 May 3432 0.422246851 0.125041218
2015 June 8678 0.356612924 0.119333949
2015 January 9930 0.351126449 0.119225549
2016 April 9344 0.397066458 0.111879315
2015 July 8450 0.349963536 0.108881821
2015 October 7630 0.38017201 0.1044298
Now i run the sentiment_by function on the dataframe based on the comments alone and then i run the following function on the resultant data frame to find the average polarity score based on year and months.
sentiment_df[,list(avg=mean(ave_sentiment)),by="month,year"]
I get the following results.
month year avg
January 2015 0.110950199
February 2015 0.126943461
March 2015 0.146546669
April 2015 0.148264268
May 2015 0.143924126
June 2015 0.110691204
July 2015 0.106472437
August 2015 0.118976304
September 2015 0.135362187
October 2015 0.111441484
November 2015 0.137699548
December 2015 0.136786867
January 2016 0.128645808
February 2016 0.129139898
March 2016 0.134595706
April 2016 0.12106743
May 2016 0.142801514
As per my understanding both should return the same results, correct me if I am wrong. Reason for me to go for the second approach is because i need to average polarity based on both month and year, as well as based on months and i don't want to use the method twice as it will cause additional time delay. Could some one let me know what i am doing wrong here?
Here is an idea: Maybe the first function is taking the averages from the individual sentences, and the second one is taking the average from the "ave sentiment", which is already an average. So, the average of averages is not always equal to the average of the individual elements.

Resources