Yahoo Finance ticker scraping using Library yfR - r

Wondering if there is a way to use yf_get() to grab a previousClose value from Yahoo Finance if there is no value for the date given. For example:
curr <- yf_get(tickers, first_date = Sys.Date() - 1, last_date = Sys.Date())
prev_q <- yf_get(tickers = tickers,
first_date = quarters[1]-.5,
last_date = quarters[1])
prev_q_closes <- round(prev_q$price_close,2)
prev_q2 <- yf_get(tickers = tickers,
first_date = quarters[2]-.5,
last_date = quarters[2], thresh_bad_data = 0.45)
prev_q2_closes <- round(prev_q2$price_close,2)
prev_q3 <- yf_get(tickers = tickers,
first_date = quarters[3]-.5,
last_date = quarters[3])
prev_q3_closes <- round(prev_q3$price_close,2)
close_prices <- cbind(prev_q3_closes, prev_q2_closes, prev_q_closes, current_close)
tick_close <- data.frame(Tickers = tickers,
ClosingPrices = close_prices,
PercentChanges = percent_changes)
dates <- rev(quarters)
dates_change <- as.character(format(dates, "%m-%d"))
#colnames(tick_close) <- c("Tickers", "Prev Q3 Closing Prices", "Prev Q2 Closing Prices", "Prev Q1 Closing Prices", "Current Closing Prices")
colnames(tick_close) <- c("Tickers", as.character(dates[1]), as.character(dates[2]), as.character(dates[3]), as.character(CurrentDate),
paste(dates_change[1],dates_change[2], sep=" to "), paste(dates_change[1], dates_change[3], sep=" to "),
paste(dates_change[1],as.character(format(CurrentDate, "%m-%d")), sep=" to "),
paste(dates_change[3],as.character(format(CurrentDate, "%m-%d")), sep=" to "))
etc and I then grab the previous quarterly date values, for some Chinese companies on Yahoo Finance they do not have a value for December 31st, and my algorithm falls apart because it cannot grab a different value. I want it to grab the December 30th value IF there is no December 31st value on Yahoo Finance!

Related

R Shiny DateRangeInput Select Only Specific Days

I'm working on creating a shiny dashboard using R Shiny. I'm using selectInput to choose a view(Daily, Weekly, Monthly, Yearly) and a dateRangeInput which will allow the user to choose the dates the plots should display. The plots are made using Plotly.
Custom date range input to allow min/max views for dates(credit here):
CustomDateRangeInput <- function(inputId, label, minview = "days", maxview = "decades", ...) {
d <- shiny::dateRangeInput(inputId, label, ...)
d$children[[2L]]$children[[1]]$attribs[["data-date-min-view-mode"]] <- minview
d$children[[2L]]$children[[3]]$attribs[["data-date-min-view-mode"]] <- minview
d$children[[2L]]$children[[1]]$attribs[["data-date-max-view-mode"]] <- maxview
d$children[[2L]]$children[[3]]$attribs[["data-date-max-view-mode"]] <- maxview
d
}
Code for creating the dateRangeInput based on the selected view:
output$asymp_dates <- renderUI({
switch (
input$id_view,
"Daily" = {
data <- GetDateRangeData(asymp, "day")
dateRangeInput("id_asymp_dates", "Date Range", min(data$Date), max(data$Date), min(data$Date), max(data$Date), "mm/dd/yyyy", "month")
},
"Weekly" = {
data <- GetDateRangeData(asymp, "week")
dateRangeInput("id_asymp_dates", "Date Range", min(data$Date), max(data$Date), min(data$Date), max(data$Date), "mm/dd/yyyy", "month")
},
"Monthly" = {
data <- GetDateRangeData(asymp, "month")
min_date <- min(floor_date(data$Date, "month"))
max_date <- max(ceiling_date(data$Date, "month")) - 1
CustomDateRangeInput("id_asymp_dates", "Date Range", "year", "decades", min_date, max_date, min_date, max_date, "MM, yyyy", "year")
},
"Yearly" = {
data <- GetDateRangeData(asymp, "year")
min_date <- min(floor_date(data$Date, "year"))
max_date <- max(ceiling_date(data$Date, "year")) - 1
CustomDateRangeInput("id_asymp_dates", "Date Range", "decades", "decades", min_date, max_date, min_date, max_date, "yyyy", "decades")
# The calculation is needed in order to reset the date range,
# but we don't want the date range editable.
shinyjs::disable("id_asymp_dates")
}
)
})
Everything works fine, but I would like for the weekly view to only be able to select the first/last day of each week so that my date ranges go from the start date being a beginning of a week and the end date being the end of a week.
ie:
dateRangeInput Calendar
March 8 - March 14, March 15 - March 21, and March 22 - March 28 are weeks. I want to be able to only select March 8, March 15, or March 22 as the first date and then March 14, March 21, or March 28 as my end date.

Extracting elements from a data.frame object in R

For each country ("BR", "MX", "RU"), I would like to create a data.frame object containing three columns: #INDICATOR, #TIME_PERIOD, and #OBS_VALUE.
library(IMFData)
IRFCL.available.codes <- DataStructureMethod("IRFCL")
names(IRFCL.available.codes)
indicators_IRFCL <- IRFCL.available.codes[[3]]
databaseID <- "IRFCL"
startdate = "2006-01-01"
enddate = " "
checkquery = FALSE
queryfilter <- list(CL_FREQ = "Q", CL_AREA_IRFCL = c("BR", "MX", "RU"), CL_INDICATOR_IRFCL = "", CL_SECTOR_IRFCL = "")
RESERVES <- CompactDataMethod(databaseID, queryfilter, startdate, enddate, checkquery)
Inside RESERVES, one can find #INDICATOR and Obs (which contains #TIME_PERIOD and #OBS_VALUE for each #INDICATOR).
You can do:
library(IMFData)
library(dplyr)
RESERVES <- CompactDataMethod(databaseID, queryfilter, startdate, enddate, checkquery, tidy = T)
and then
RESERVES %>%
select(`#REF_AREA`, `#TIME_PERIOD`, `#INDICATOR`, `#OBS_VALUE`) %>%
group_by(`#REF_AREA`) %>%
group_split(keep = F) -> list_of_tibbles
This way you get a list of data.frames (tibbles)

Dynamic Time Series Function R

I currently have a time series of football data for weekly stats for variables such as shots and goals. I want to create a "form" function with input for number of games (specify date) and the variable of choice (shots, goals, etc) so that I can check the form of players for certain stats over the last 4 games, 6 games or whatever period I specify. The data frame is of the form:
week = as.vector(c(rep(25, 5), rep(26, 5), rep(27, 5)))
date = as.vector(c(rep("2019-08-09 15:00:00", 5), rep("2019-08-16 15:00:00", 5), rep("2019-08-23 15:00:00", 5)))
players = c("Player 1", "Player 2", "Player 3", "Player 4", "Player 5")
name = as.vector(c(rep(players, 3)))
goals = as.vector(sample(c(0:2), 15, replace = T))
shots = as.vector(sample(c(0:8), 15, replace = T))
data = data.frame(week, date, name, goals, shots)
Would it make sense to create a function using dplyr and input variables for time period and variable type? Or is there some package that will do this for me?
This answer could give you some idea how to filter the data frame for date or games played as specified in the comments:
library(tidyverse)
library(lubridate)
data = tibble(
week = rep(31:40, each = 2),
date = seq.Date(ymd("2019-01-01"), length.out = 20, by = "months"),
name = paste0("player", rep(1:4, each = 5)),
goals = sample(c(0:2), 20, replace = T),
shots = sample(c(0:8), 20, replace = T)
)
# last 3 months or after
data %>%
filter(date > (today() %m-% months(3) ))
# last 5 games
data %>%
filter(week > (max(week) - 4) )

What its mean by Error in seq.int(0, to0 - from, by) : wrong sign in 'by' argument in r

Code below used to web scrape a website using API call. I just have to change the startDate and endDate to get data set that I want. Previously it works fine, doing its loops wonderfully but after I did some modification on the html_nodes() part - try to extract different section in the webpage, it keep return me data of the same date. with error warning 'Error in seq.int(0, to0 - from, by) : wrong sign in 'by' argument in r'. What should be done here?
library(tidyverse)
library(readr)
library(tidyr)
library(dplyr)
library(xlsx)
library(beepr)
get_sounding_data <- function(region = c("naconf", "samer", "pac", "nz", "ant",
"np", "europe", "africa", "seasia", "mideast"),
date,
from_hr = c("00", "12", "all"),
to_hr = c("00", "12", "all"),
station_number = 48615) {
# we use these pkgs (the readr and dplyr dependencies removed)
suppressPackageStartupMessages({
require("xml2", quietly = TRUE)
require("httr", quietly = TRUE)
require("rvest", quietly = TRUE)
})
# validate region
region <- match.arg(
arg = region,
choices = c(
"naconf", "samer", "pac", "nz", "ant",
"np", "europe", "africa", "seasia", "mideast"
)
)
# validates the date for us if it's a character string
date <- as.Date(date)
# get year and month
year <- as.integer(format(date, "%Y"))
stopifnot(year %in% 1973:as.integer(format(Sys.Date(), "%Y")))
year <- as.character(year)
month <- format(date, "%m")
# we need these to translate day & *_hr to the param the app needs
c(
"0100", "0112", "0200", "0212", "0300", "0312", "0400", "0412",
"0500", "0512", "0600", "0612", "0700", "0712", "0800", "0812",
"0900", "0912", "1000", "1012", "1100", "1112", "1200", "1212",
"1300", "1312", "1400", "1412", "1500", "1512", "1600", "1612",
"1700", "1712", "1800", "1812", "1900", "1912", "2000", "2012",
"2100", "2112", "2200", "2212", "2300", "2312", "2400", "2412",
"2500", "2512", "2600", "2612", "2700", "2712", "2800", "2812",
"2900", "2912", "3000", "3012", "3100", "3112"
) -> hr_vals
c(
"01/00Z", "01/12Z", "02/00Z", "02/12Z", "03/00Z", "03/12Z", "04/00Z",
"04/12Z", "05/00Z", "05/12Z", "06/00Z", "06/12Z", "07/00Z", "07/12Z",
"08/00Z", "08/12Z", "09/00Z", "09/12Z", "10/00Z", "10/12Z", "11/00Z",
"11/12Z", "12/00Z", "12/12Z", "13/00Z", "13/12Z", "14/00Z", "14/12Z",
"15/00Z", "15/12Z", "16/00Z", "16/12Z", "17/00Z", "17/12Z", "18/00Z",
"18/12Z", "19/00Z", "19/12Z", "20/00Z", "20/12Z", "21/00Z", "21/12Z",
"22/00Z", "22/12Z", "23/00Z", "23/12Z", "24/00Z", "24/12Z", "25/00Z",
"25/12Z", "26/00Z", "26/12Z", "27/00Z", "27/12Z", "28/00Z", "28/12Z",
"29/00Z", "29/12Z", "30/00Z", "30/12Z", "31/00Z", "31/12Z"
) -> hr_inputs
hr_trans <- stats::setNames(hr_vals, hr_inputs)
o_from_hr <- from_hr <- as.character(tolower(from_hr))
o_to_hr <- to_hr <- as.character(tolower(to_hr))
if ((from_hr == "all") || (to_hr == "all")) {
from_hr <- to_hr <- "all"
} else {
from_hr <- hr_trans[sprintf("%s/%02dZ", format(date, "%d"), as.integer(from_hr))]
match.arg(from_hr, hr_vals)
to_hr <- hr_trans[sprintf("%s/%02dZ", format(date, "%d"), as.integer(to_hr))]
match.arg(to_hr, hr_vals)
}
# clean up the station number if it was entered as a double
station_number <- as.character(as.integer(station_number))
# execute the API call
httr::GET(
url = "http://weather.uwyo.edu/cgi-bin/sounding",
query = list(
region = region,
TYPE = "TEXT:LIST",
YEAR = year,
MONTH = sprintf("%02d", as.integer(month)),
FROM = from_hr,
TO = to_hr,
STNM = station_number
)
) -> res
# check for super bad errors (that we can't handle nicely)
httr::stop_for_status(res)
# get the page content
doc <- httr::content(res, as="text")
# if the site reports no data, issue a warning and return an empty data frame
if (grepl("Can't get", doc)) {
doc <- xml2::read_html(doc)
msg <- rvest::html_nodes(doc, "body")
msg <- rvest::html_text(msg, trim=TRUE)
msg <- gsub("\n\n+.*$", "", msg)
warning(msg)
return(data.frame(stringsAsFactors=FALSE))
}
# if the site reports no data, issue a warning and return an empty data frame
if (grepl("Can't get", doc)) {
doc <- xml2::read_html(doc)
msg <- rvest::html_nodes(doc, "body")
msg <- rvest::html_text(msg, trim=TRUE)
msg <- gsub("\n\n+.*$", "", msg)
warning(msg)
return(data.frame(stringsAsFactors=FALSE))
}
# turn it into something we can parse
doc <- xml2::read_html(doc)
raw_dat <- doc %>%
html_nodes("h3+ pre") %>%
html_text()
indices <- raw_dat %>%
str_split(pattern = "\n", simplify = T) %>%
map_chr(str_squish) %>%
tibble(x = .) %>%
separate(x, into = c("Station", "Value"), sep = ": ") %>%
filter(!is.na(Value))
data <- tidyr::spread(indices, Station, Value)
data
}
startDate <- as.Date("01-11-1979", format="%d-%m-%y")
endDate <- as.Date("31-01-1980",format="%d-%m-%y")
days <- seq(startDate, endDate, "1 day")
lapply(days[1:92], function(day) {
get_sounding_data(
region = "seasia",
date = day,
from_hr = "00",
to_hr = "00",
station_number = "48615"
)
}) -> soundings_48615
warnings()
new_df <- map(soundings_48615, . %>% mutate_all(parse_guess))
dat <- bind_rows(new_df)
dat <- dat %>% separate(col =`Observation time`, into = c('Date', 'time'), sep = '/')
dat$Date <- as.Date(dat$Date, format = "%y%m%d")
#save in text file
write.csv(dat, 'c:/Users/Hp/Documents/yr/climatology/yr_SoundingIndexLowerPart/1979.csv')
get_sounding_data <- NULL
beep()
It seems that the error you are having right now, is date format based. More specific the following times
as.Date("01-11-1979", format="%d-%m-%y")
as.Date("31-01-1980",format="%d-%m-%y")
outputs
"2019-11-01"
"2019-01-31"
R's date/time format is international standard yyyy-mm-dd. As such the "2019-11-01" comes after "2019-01-31" time wise. As such this will cause sequence to fail if it tries to iterate 1 positive day at a time. The formatting is the problem here, for this to be solved there is a very simple solution. Always work in international standard date formats, as these will be recognized by (almost) all programs.
As such change the date sequence part of your code to
....
startDate <- as.Date("1979-11-01")
endDate <- as.Date("1980-01-31")
days <- seq(startDate, endDate, "1 day")
....
Note the format change. As for why it changes 1979 to 2019, i am not entirely sure, someone else might have a more intricate answer for this weird behaviour.
I got this error when I was trying to create a sequence of dates using the seq() function. Turns out I was putting the start and end dates in the wrong order.
This would give me the error:
seq(as.Date("2012-12-30"), as.Date("2011-01-04"), by="days")
Error in seq.int(0, to0 - from, by) : wrong sign in 'by' argument
But this would not give me the error and would give me my date sequence:
seq(as.Date("2011-12-30"), as.Date("2012-01-04"), by="days")

Weekly data plot in R

I have a big data frame (from 2007 to 2015), with data points at about every 2 minutes. I want to plot the graph of every week (from 2007 to 2015), with each week being automatically exported as a PNG file to my computer's folder. Previously, I was able to successfully produce working codes for annually, monthly, and daily plot. E.g.for yearly data:
for(j in 2007:2015){
mypath <- file.path("~", "Documents","Yearly", paste("WAO_AIR_Data_", j, ".png", sep = "" ))
png(filename = mypath, width = 963, height = 690)
timePlot(selectByDate(new_subdata, year = j),
pollutant = c("CO2", "O2", "APO"),
date.pad = TRUE,
pch = c(19,19,19),
cex = 0.2,
xlab = paste("Month of year in", j),
ylab = "CO2, O2, and APO concentrations",
name.pol = c("CO2 (ppm)", "O2 (per meg)", "APO (per meg)"),
)
dev.off()
}
The data frame looks like this
tail(new_subdata)
date CO2 O2 APO
1052042 2015-12-31 23:48:45 409.636 -666.39 -353.27
1052043 2015-12-31 23:50:46 409.652 -669.62 -356.41
1052044 2015-12-31 23:52:44 409.679 -669.44 -356.09
1052045 2015-12-31 23:54:46 409.703 -667.07 -353.59
1052046 2015-12-31 23:56:44 409.719 -671.02 -357.46
1052047 2015-12-31 23:58:46 409.734 NA NA
But I dont know how to produce the code for weekly plotting. Can anyone help me please? Thank you so much!
Via ?strptime, you can get the week out of a Date or POSIXct with %U
%U
Week of the year as decimal number (00–53) using Sunday as the first day 1 of the week (and typically with the first Sunday of the year as day 1 of week 1). The US convention.
x <- Sys.time()
class(x); format(x, '%U')
# [1] "POSIXct" "POSIXt"
# [1] "26"
x <- Sys.Date()
class(x); format(x, '%U')
# [1] "Date"
# [1] "26"
Using your example data with minor changes:
new_subdata <- read.table(header = TRUE, text = "date CO2 O2 APO
1052042 '2015-10-31 23:48:45' 409.636 -666.39 -353.27
1052043 '2015-10-31 23:50:46' 409.652 -669.62 -356.41
1052044 '2015-11-30 23:52:44' 409.679 -669.44 -356.09
1052045 '2015-11-30 23:54:46' 409.703 -667.07 -353.59
1052046 '2015-12-31 23:56:44' 409.719 -671.02 -357.46
1052047 '2015-12-31 23:58:46' 409.734 NA NA")
## create a new grouping variable with year/week
new_subdata <- within(new_subdata, {
yr_wk <- format(as.Date(date), '%Y %U')
})
## iterate over the unique values
jj <- unique(new_subdata$yr_wk)
# [1] "2015 43" "2015 48" "2015 52"
## do some plotting
par(mfrow = n2mfrow(length(jj)), las = 1, mar = c(5,6,2,2),
tcl = .2, mgp = c(3,.25,0))
xr <- range(new_subdata$O2, na.rm = TRUE)
yr <- range(new_subdata$CO2, na.rm = TRUE)
for (j in jj) {
mypath <- file.path("~", "Documents","Yearly", sprintf("WAO_AIR_Data_%s.png", j))
# png(filename = mypath, width = 963, height = 690)
plot(CO2 ~ O2, data = subset(new_subdata, yr_wk == j), xlim = xr, ylim = yr)
# dev.off()
}

Resources