extract results after Post query - r

I am trying to extract automatically electricity offers from this site.Once I set the postcode (i.e: 300) , I can download(manually) the pdf files
I am using httr package :
library(httr)
qr<- POST("http://www.qenergy.com.au/What-Are-Your-Options",
query=list(postcode=3000))
res <- htmlParse(content(qr))
The problem is that the files urls are not in the query response. Any help please.

Try this
library(httr)
qr<- POST("http://www.qenergy.com.au/What-Are-Your-Options",
encode="form",
body=list(postcode=3000))
res <- content(qr)
pdfs <- as(res['//a[contains(#href, "pdf")]/#href'], "character")
head(pdfs)
# [1] "flux-content/qenergy/pdf/VIC price fact sheet jemena distribution zone business/Jemena-Freedom-Biz-5-Day-Time-of-Use-A210.pdf"
# [2] "flux-content/qenergy/pdf/VIC price fact sheet jemena distribution zone business/Jemena-Freedom-Biz-7-Day-Time-of-Use-A250.pdf"
# [3] "flux-content/qenergy/pdf/VIC price fact sheet jemena distribution zone business/Jemena-Freedom-Biz-Single-Rate-CL.pdf"
# [4] "flux-content/qenergy/pdf/VIC price fact sheet jemena distribution zone business/Jemena-Freedom-Biz-Single-Rate.pdf"
# [5] "flux-content/qenergy/pdf/VIC price fact sheet united energy distribution zone business/United-Freedom-Biz-5-Day-Time-of-Use.pdf"
# [6] "flux-content/qenergy/pdf/VIC price fact sheet united energy distribution zone business/United-Freedom-Biz-7-Day-Time-of-Use.pdf"

Related

How to deal with extended(280 characters) Tweets using twitteR package

Recently twitter has expanded the character limit of a tweet to 280 characters.
Since then, the TwitteR package only retrieves (or shows, IDK) the initial 140 characters of an extended tweet.
# load package
library(twitteR)
# set oauth
setup_twitter_oauth(Consumer_Key,Consumer_Secret,Access_Token,Access_Token_Secret)
# get user timeline
k<-userTimeline("SenateFloor", n = 50, includeRts = T)
# to data frame
k<-twListToDF(k)
# print tweet text
print(k$text[1:5])
Console output
[1] "#Senate in at 4:00 PM. Following Leader remarks, will proceed to Executive Session & resume consideration of Cal. #… https:// t.co/BpcPa15Twp"
[2] "RT #GovTop: Weekly Digest of the #CongressionalRecord https:// t.co/vuH71y8FpH"
[3] "#HJRes123 ( Making further continuing appropriations for fiscal year 2018). The Joint Resolution was agreed to by a… https:// t.co/bquyMPPhhm"
[4] "#HJRes123 ( Making further continuing appropriations for fiscal year 2018). https:// t.co/SOmYJ3Dv4t"
[5] "Cal. #167, Susan Bodine to be Assistant Administrator of the Environmental Protection Agency. The nomination was co… https:// t.co/pW7qphwloh"
As you can see, an elipsis (...) cuts the tweets that pass the 140 limit.
> nchar(k2$text[1:5])
[1] 144 77 140 99 140
Is there any way to get the whole text from this extended tweets?
As noted in the comment, just use rtweet:
library(rtweet)
library(tidyverse)
sen_df <- get_timeline("SenateFloor", 300)
mutate(sen_df, `Tweet Length`=map_dbl(text, nchar)) %>%
ggplot(aes(`Tweet Length`)) +
ggalt::geom_bkde(color="steelblue", fill="steelblue", alpha=2/3) +
scale_y_continuous(expand=c(0,0)) +
labs(title="#SenateFloor Tweet Length Distribution") +
hrbrthemes::theme_ipsum_rc(grid="XY")
If you would like to continue to use twitteR then you could try this:
# get user timeline
k<-userTimeline("SenateFloor", n = 50, includeRts = T, tweet_mode = "extended")

Obtaining only numeric output from viewFinancials without additional text

I calculated dividend yield of Microsoft the following way:
# load financial data for MSFT
library(quantmod)
getFinancials('MSFT')
# calculate dividend yield for MSFT
as.numeric(first(-viewFinancials(MSFT.f, type='CF', period='A',subset = NULL)['Total Cash Dividends Paid',]/viewFinancials(MSFT.f, type='BS', period='A',subset = NULL)['Total Common Shares Outstanding',]))
Here is the output
Annual Cash Flow Statement for MSFT
Annual Balance Sheet for MSFT
[1] 1.40958
How is it possible to have only the numeric output 1.40958 without the additional text Annual Cash Flow Statement for MSFT and Annual Balance Sheet for MSFT? Is there a way to suppress those?
The two strings, "Annual Cash Flow Statement for MSFT" and "Annual Balance Sheet for MSFT" are messages from viewFinancials. They are not attached to the result in any way.
R> dy <- as.numeric(first(-viewFinancials(MSFT.f, type='CF', period='A',subset = NULL)['Total Cash Dividends Paid',]/viewFinancials(MSFT.f, type='BS', period='A',subset = NULL)['Total Common Shares Outstanding',]))
Annual Cash Flow Statement for MSFT
Annual Balance Sheet for MSFT
R> dy
[1] 1.40958
If you want to squelch the messages, use suppressMessages().
R> suppressMessages(dy <- as.numeric(first(-viewFinancials(MSFT.f, type='CF', period='A',subset = NULL)['Total Cash Dividends Paid',]/viewFinancials(MSFT.f, type='BS', period='A',subset = NULL)['Total Common Shares Outstanding',])))
R> dy
[1] 1.40958
R>

Loop through a large list to look for certain values

I have a notepad document listing all the zipcodes in America and I want to look for a particular zipcode (specified by the user), run some code on that zipcode that prints out a statement like "The temperature at [input] is 60"And I want to do that for the specified zipcode and the next 10 sequential ones that appear in my zipcode dossier. My first issue is that I don't know how to convert this zipcode document containing all these numbers into a list or array or whatever makes it easy to run a for loop on. I'm extremely new to R so bear with me.
Input = "20904"# User provides an input for this
ZipData<-read.csv(file.path(wd,"DataImport","zip_code_data.csv"),
colClasses=c("character","NULL","factor","NULL","NULL","factor",
"NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL",
"NULL","NULL"),
col.names=c("zip","","city","","","state","","","","","","","","",
"",""))
ZipData<- as.numeric(ZipData$zip)
edit(ZipData) # Opens up a notepad document listing the zipcodes starting with "c(544, 601, 602, 603, 604, 605, 606, 610, 611, 612, 613, 614....)
# Note: typeof(ZipData) prints out "double"
# Bunch of code that ends with:
a <- paste("The current temperature for", cityName, Input, "is", temperature, sep=" ")
print(a)
I want to run this on the Input and the next 10 zipcodes that follow. I am having trouble formulating a for-loop that loops through the entire list (I don't know if ZipData can be classified as a list), finds the user specified zipcode, runs my block of code on it and rinses and repeats for the next 10 zipcodes. My program should end with 10 print statements listing all of their temperatures. Any ideas?
This may be what you're referring to:
zip_plus10 <- function(input) {
index <- which(zip.vector == as.numeric(input))
paste('The current temperature for',
city[index:(index+10)],
state[index:(index+10)],
zip.vector[index:(index+10)],
'is: ',
temps[index:(index+10)]
)
}
zip_plus10('90210')
[1] "The current temperature for K HI 90210 is: 65"
[2] "The current temperature for L ID 90211 is: 66"
[3] "The current temperature for M IL 90212 is: 58"
[4] "The current temperature for N IN 90213 is: 110"
[5] "The current temperature for O IA 90214 is: 57"
[6] "The current temperature for P KS 90215 is: 91"
[7] "The current temperature for Q KY 90216 is: 90"
[8] "The current temperature for R LA 90217 is: 89"
[9] "The current temperature for S ME 90218 is: 108"
[10] "The current temperature for T MD 90219 is: 109"
[11] "The current temperature for U MA 90220 is: 55"
#Data
set.seed(444)
zip.vector <- seq(90200, 90221)
city <- LETTERS[1:length(zip.vector)]
state <- state.abb[1:length(zip.vector)]
temps <- sample(50:110, length(zip.vector))

list files in R by dates

I have a set of netcdf file that is organised by dates in my directory ( each file is one day of data). I read all the files in R using
require(RNetCDF)
files= list.files( ,pattern='*.nc',full.names=TRUE)
When I run the codes R reads 2014 and 2013, then parts of 2010 is at the end .. ( see below sample output in R)
"./MERRA100.prod.assim.tavg1_2d_lnd_Nx.19820223.SUB.nc"
"./MERRA100.prod.assim.tavg1_2d_lnd_Nx.19820224.SUB.nc"
"./MERRA100.prod.assim.tavg1_2d_lnd_Nx.19820225.SUB.nc"
"./MERRA301.prod.assim.tavg1_2d_lnd_Nx.20130829.SUB.nc"
"./MERRA301.prod.assim.tavg1_2d_lnd_Nx.20130830.SUB.nc"
"./MERRA301.prod.assim.tavg1_2d_lnd_Nx.20130831.SUB.nc"
"./MERRA301.prod.assim.tavg1_2d_lnd_Nx.20100626.SUB.nc"
"./MERRA301.prod.assim.tavg1_2d_lnd_Nx.20100827.SUB.nc"
"./MERRA301.prod.assim.tavg1_2d_lnd_Nx.20100828.SUB.nc"
I am trying to generate daily times series for these files using a loop..so when i apply the rest of my codes.. data for from June to Aug 2010 comes to end of daily time series. I rather suspect that this has to do how the files are listed R
Is there any way to list files in R and ensure it is organized dates?
Here are your files unsorted
paths <- c("./MERRA100.prod.assim.tavg1_2d_lnd_Nx.19820223.SUB.nc",
"./MERRA100.prod.assim.tavg1_2d_lnd_Nx.19820224.SUB.nc",
"./MERRA100.prod.assim.tavg1_2d_lnd_Nx.19820225.SUB.nc",
"./MERRA301.prod.assim.tavg1_2d_lnd_Nx.20130829.SUB.nc",
"./MERRA301.prod.assim.tavg1_2d_lnd_Nx.20130830.SUB.nc",
"./MERRA301.prod.assim.tavg1_2d_lnd_Nx.20130831.SUB.nc",
"./MERRA301.prod.assim.tavg1_2d_lnd_Nx.20100626.SUB.nc",
"./MERRA301.prod.assim.tavg1_2d_lnd_Nx.20100827.SUB.nc",
"./MERRA301.prod.assim.tavg1_2d_lnd_Nx.20100828.SUB.nc")
I'm using a regular expression to extract the 8 digits in the date, YYYYMMDD, and you should be able to sort by the string of digits, but you can also just convert them into dates
## matches ...Nx.<number of digits = 8>... and captures the stuff in <>
## and saves this match to the first capture group, \\1
pattern <- '.*Nx\\.(\\d{8}).*'
gsub(pattern, '\\1', paths)
# [1] "19820223" "19820224" "19820225" "20130829" "20130830" "20130831"
# [7] "20100626" "20100827" "20100828"
sort(gsub(pattern, '\\1', paths))
# [1] "19820223" "19820224" "19820225" "20100626" "20100827" "20100828"
# [7] "20130829" "20130830" "20130831"
## not necessary to convert that into dates but you can
as.Date(sort(gsub(pattern, '\\1', paths)), '%Y%m%d')
# [1] "1982-02-23" "1982-02-24" "1982-02-25" "2010-06-26" "2010-08-27"
# [6] "2010-08-28" "2013-08-29" "2013-08-30" "2013-08-31"
And order the original paths
## so you can use the above to order the paths
paths[order(gsub(pattern, '\\1', paths))]
# [1] "./MERRA100.prod.assim.tavg1_2d_lnd_Nx.19820223.SUB.nc"
# [2] "./MERRA100.prod.assim.tavg1_2d_lnd_Nx.19820224.SUB.nc"
# [3] "./MERRA100.prod.assim.tavg1_2d_lnd_Nx.19820225.SUB.nc"
# [4] "./MERRA301.prod.assim.tavg1_2d_lnd_Nx.20100626.SUB.nc"
# [5] "./MERRA301.prod.assim.tavg1_2d_lnd_Nx.20100827.SUB.nc"
# [6] "./MERRA301.prod.assim.tavg1_2d_lnd_Nx.20100828.SUB.nc"
# [7] "./MERRA301.prod.assim.tavg1_2d_lnd_Nx.20130829.SUB.nc"
# [8] "./MERRA301.prod.assim.tavg1_2d_lnd_Nx.20130830.SUB.nc"
# [9] "./MERRA301.prod.assim.tavg1_2d_lnd_Nx.20130831.SUB.nc"

Quantmod FRED Metadata in R

library(quantmod)
getSymbols("GDPC1",src = "FRED")
I am trying to extract the numerical economic/financial data in FRED but also the metadata. I am trying to chart CPI and have the meta data as a labels/footnotes. Is there a way to extract this data using the quantmod package?
Title: Real Gross Domestic Product
Series ID: GDPC1
Source: U.S. Department of Commerce: Bureau of Economic Analysis
Release: Gross Domestic Product
Seasonal Adjustment: Seasonally Adjusted Annual Rate
Frequency: Quarterly
Units: Billions of Chained 2009 Dollars
Date Range: 1947-01-01 to 2014-01-01
Last Updated: 2014-06-25 7:51 AM CDT
Notes: BEA Account Code: A191RX1
Real gross domestic product is the inflation adjusted value of the
goods and services produced by labor and property located in the
United States.
For more information see the Guide to the National Income and Product
Accounts of the United States (NIPA) -
(http://www.bea.gov/national/pdf/nipaguid.pdf)
You can use the same code that's in the body of getSymbools.FRED, but change ".csv" to ".xls", then read the metadata you're interested in from the .xls file.
library(gdata)
Symbol <- "GDPC1"
FRED.URL <- "http://research.stlouisfed.org/fred2/series"
tmp <- tempfile()
download.file(paste0(FRED.URL, "/", Symbol, "/downloaddata/", Symbol, ".xls"),
destfile=tmp)
read.xls(tmp, nrows=17, header=FALSE)
# V1 V2
# 1 Title: Real Gross Domestic Product
# 2 Series ID: GDPC1
# 3 Source: U.S. Department of Commerce: Bureau of Economic Analysis
# 4 Release: Gross Domestic Product
# 5 Seasonal Adjustment: Seasonally Adjusted Annual Rate
# 6 Frequency: Quarterly
# 7 Units: Billions of Chained 2009 Dollars
# 8 Date Range: 1947-01-01 to 2014-01-01
# 9 Last Updated: 2014-06-25 7:51 AM CDT
# 10 Notes: BEA Account Code: A191RX1
# 11 Real gross domestic product is the inflation adjusted value of the
# 12 goods and services produced by labor and property located in the
# 13 United States.
# 14
# 15 For more information see the Guide to the National Income and Product
# 16 Accounts of the United States (NIPA) -
# 17 (http://www.bea.gov/national/pdf/nipaguid.pdf)
Instead of hardcoding nrows=17, you can use grep to search for the row that has the headers of the data, and subset to only include rows before that.
dat <- read.xls(tmp, header=FALSE, stringsAsFactors=FALSE)
dat[seq_len(grep("DATE", dat[, 1])-1),]
unlink(tmp) # remove the temp file when you're done with it.
FRED has a straightforward, well-document json interface http://api.stlouisfed.org/docs/fred/ which provides both metadata and time series data for all of its economic series. Access requires a FRED account and api key but these are available on request from http://api.stlouisfed.org/api_key.html .
The excel descriptive data you asked for can be retrieved using
get.FRSeriesTags <- function(seriesNam)
{
# seriesNam = character string containing the ID identifying the FRED series to be retrieved
#
library("httr")
library("jsonlite")
# dummy FRED api key; request valid key from http://api.stlouisfed.org/api_key.html
apiKey <- "&api_key=abcdefghijklmnopqrstuvwxyz123456"
base <- "http://api.stlouisfed.org/fred/"
seriesID <- paste("series_id=", seriesNam,sep="")
fileType <- "&file_type=json"
#
# get series descriptive data
#
datType <- "series?"
url <- paste(base, datType, seriesID, apiKey, fileType, sep="")
series <- fromJSON(url)$seriess
#
# get series tag data
#
datType <- "series/tags?"
url <- paste(base, datType, seriesID, apiKey, fileType, sep="")
tags <- fromJSON(url)$tags
#
# format as excel descriptive rows
#
description <- data.frame(Title=series$title[1],
Series_ID = series$id[1],
Source = tags$notes[tags$group_id=="src"][1],
Release = tags$notes[tags$group_id=="gen"][1],
Frequency = series$frequency[1],
Units = series$units[1],
Date_Range = paste(series[1, c("observation_start","observation_end")], collapse=" to "),
Last_Updated = series$last_updated[1],
Notes = series$notes[1],
row.names=series$id[1])
return(t(description))
}
Retrieving the actual time series data would be done in a similar way. There are several json packages available for R but jsonlite works particularly well for this application.
There's a bit more to setting this up than the previous answer but perhaps worth it if you do much with FRED data.

Resources