Sub setting times out of time series in R - r

I downloaded stock market data from Yahoo (code below) - for context, at first I tried with getSymbols(^DJI) but I got error messages possibly related to Yahoo... different issue.
The point is that once downloaded, and imported into R, I massaged it into a format close enough to a time series to be able to run chartSeries(DJI):
require(RCurl)
require(foreign)
x <- getURL("https://raw.githubusercontent.com/RInterested/datasets/gh-pages/%5EDJI.csv")
DJI <- read.csv(text = x, sep =",")
DJI$Date <- as.Date(DJI$Date, format = "%m/%d/%Y") # Formatting Date as.Date
rownames(DJI) <- DJI$Date # Assigning Date to row names
DJI$Date <- NULL # Removing the Date column
chartSeries(DJI, type="auto", theme=chartTheme('white'))
even if the dataset is not really a time series:
> is.ts(DJI)
[1] FALSE
The problem comes about when I try to find out the date of, for instance, the minimum closing value of the Dow. I can do something like
> DJI[DJI$Close == min(DJI$Close),]
Open High Low Close Adj.Close Volume
1985-05-01 1257.18 1262.81 1239.07 1242.05 1242.05 10050000
yielding the entire row, including the row name (1985-05-01), which is the only part I want. However, if I insist on just getting the actual date, I have to juggle a second dataset containing the dates in one of the columns:
require(RCurl)
require(foreign)
x <- getURL("https://raw.githubusercontent.com/RInterested/datasets/gh-pages/%5EDJI.csv")
DJI <- read.csv(text = x, sep =",")
DJI$Date <- as.Date(DJI$Date, format = "%m/%d/%Y") # Formatting Date as.Date
rownames(DJI) <- DJI$Date # Assigning Date to row names
DJI.raw <- DJI # Second dataset for future subsetting
DJI$Date <- NULL # Removing the Date column
which does allow me to run
> DJI.raw$Date[DJI.raw$Close == min(DJI.raw$Close)]
[1] "1985-05-01"
Further, I don't think that turning the dataset into an .xts file would help.

I'm not clear what you want but it sounds like you just want the date? You mention xts is not an option (which would have been runnable)
time(as.xts(DJI))[which.min(DJI$Close)] # POSIXct format
# [1] "1985-05-01 EDT"
Otherwise a simple rownames + which.min would get the date for you?
as.Date(rownames(DJI)[which.min(DJI$Close)]) # Date format
# [1] "1985-05-01"

Related

Changing Dates in R from webscraper but not able to convert

I am trying to complete a problem that pulls from two data sets that need to be combined into one data set. To get to this point, I need to rbind both data sets by the year-month information. Unfortunately, the first data set needs to be tallied by year-month info, and I can't seem to figure out how to change the date so I can have month-year info rather than month-day-year info.
This is data on avalanches and I need to write code totally the number of avalanches each moth for the Snow Season, defined as Dec-Mar. How do I do that?
I keep trying to convert the format of the date to month-year but after I change it with
as.Date(avalancheslc$Date, format="%y-%m")
all the values for Date turn to NA's....help!
# write the webscraper
library(XML)
library(RCurl)
avalanche<-data.frame()
avalanche.url<-"https://utahavalanchecenter.org/observations?page="
all.pages<-0:202
for(page in all.pages){
this.url<-paste(avalanche.url, page, sep=" ")
this.webpage<-htmlParse(getURL(this.url))
thispage.avalanche<-readHTMLTable(this.webpage, which=1, header=T)
avalanche<-rbind(avalanche,thispage.avalanche)
}
# subset the data to the Salt Lake Region
avalancheslc<-subset(avalanche, Region=="Salt Lake")
str(avalancheslc)
avalancheslc$monthyear<-format(as.Date(avalancheslc$Date),"%Y-%m")
# How can I tally the number of avalanches?
The final output of my dataset should be something like:
date avalanches
2000-1 18
2000-2 4
2000-3 10
2000-12 12
2001-1 52
This should work (I tried it on only 1 page, not all 203). Note the use of the option stringsAsFactors = F in the readHTMLTable function, and the need to add names because 1 column does not automatically get one.
library(XML)
library(RCurl)
library(dplyr)
avalanche <- data.frame()
avalanche.url <- "https://utahavalanchecenter.org/observations?page="
all.pages <- 0:202
for(page in all.pages){
this.url <- paste(avalanche.url, page, sep=" ")
this.webpage <- htmlParse(getURL(this.url))
thispage.avalanche <- readHTMLTable(this.webpage, which = 1, header = T,
stringsAsFactors = F)
names(thispage.avalanche) <- c('Date','Region','Location','Observer')
avalanche <- rbind(avalanche,thispage.avalanche)
}
avalancheslc <- subset(avalanche, Region == "Salt Lake")
str(avalancheslc)
avalancheslc <- mutate(avalancheslc, Date = as.Date(Date, format = "%m/%d/%Y"),
monthyear = paste(year(Date), month(Date), sep = "-"))

R, csv file, As.Date returns only NAs

I have a csv file containing financial data (i.e. dates with corresponding prices). My goal is to load these data in R and convert the dates from character data to dates. I tried the following:
data<-read.csv("data.csv",sep=";")
attach(data)
as.Date(Date,format="%Y-%b-%d") #'Date' is the column containing the dates
Unfortunately, this only leads to NAs in Date. Things that were proposed in other threads on this issue but did not help me:
reading in the csv file with 'stringsAsFactors=FALSE'
formatting the dates in Excel as dates
Here is a sample of my csv file:
Date;Open;High;Low;Close;Volume;Adj Close
30.10.2015;10842.51953;10850.58008;10748.7002;10850.13965;89270000;10850.13965
29.10.2015;10867.19043;10886.98047;10741.13965;10800.83984;122513100;10800.83984
28.10.2015;10728.16016;10848.41016;10691.62988;10831.95996;0;10831.95996
27.10.2015;10761.37012;10807.41016;10692.19043;10692.19043;0;10692.19043
26.10.2015;10791.17969;10863.08984;10756.83008;10801.33984;73091500;10801.33984
23.10.2015;10610.33008;10847.46973;10586.95996;10794.54004;0;10794.54004
22.10.2015;10213.00977;10508.25;10194.74023;10491.96973;107511600;10491.96973
21.10.2015;10185.41992;10277.58984;10107.91992;10238.09961;70021400;10238.09961
20.10.2015;10174.79981;10194.53027;10080.19043;10147.67969;67235200;10147.67969
Your format argument was incorrect, which is usually the cause of NAs when coercing strings to Date objects. You can use this instead:
R> as.Date(Df$Date, format = "%d.%m.%Y")
#[1] "2015-10-30" "2015-10-29" "2015-10-28" "2015-10-27" "2015-10-26"
#[6] "2015-10-23" "2015-10-22" "2015-10-21" "2015-10-20"
Instead of attach, you can use alternatives such as within to avoid qualifying your column names. For example,
Df <- within(Df, {
Date <- as.Date(Date, format = "%d.%m.%Y")
})
##
R> class(Df$Date)
#[1] "Date"
Data:
Df <- read.table(
text = "Date;Open;High;Low;Close;Volume;Adj Close
30.10.2015;10842.51953;10850.58008;10748.7002;10850.13965;89270000;10850.13965
29.10.2015;10867.19043;10886.98047;10741.13965;10800.83984;122513100;10800.83984
28.10.2015;10728.16016;10848.41016;10691.62988;10831.95996;0;10831.95996
27.10.2015;10761.37012;10807.41016;10692.19043;10692.19043;0;10692.19043
26.10.2015;10791.17969;10863.08984;10756.83008;10801.33984;73091500;10801.33984
23.10.2015;10610.33008;10847.46973;10586.95996;10794.54004;0;10794.54004
22.10.2015;10213.00977;10508.25;10194.74023;10491.96973;107511600;10491.96973
21.10.2015;10185.41992;10277.58984;10107.91992;10238.09961;70021400;10238.09961
20.10.2015;10174.79981;10194.53027;10080.19043;10147.67969;67235200;10147.67969",
header = TRUE, stringsAsFactors = FALSE, sep = ";")

Subsetting data table in R by date

I need to subset my data by a date range, below is the code.
I read in two .csv (data2010, data2), I changed the date format to exclude the timestamp, rename the headers so they are the same for both files, then merge(data2011).
The files seem to actually merge but when I subset by the date range, no observations are created.
However, the date is grouped like 01/01/10 01/01/11 01/02/10 01/02/11 =
so same month/same day/different year pairing.
data2010 <- read.csv(file="2010final.csv")
data2 <- read.csv(file="2011final.csv")
#change format of timestamp to date with mm/dd/yyyy for 2011
data2$newdate <-strptime(as.character(data2$Date), "%m/%d/%y")
data2$Date <- format(data2$newdate, "%m/%d/%y")
data2$newdate <- NULL
#rename and format 2010
names(data2010) <- c("Region", "District", "Age", "Gender", "Marital Status", "Date", "Reason")
data2010$newdate <-strptime(as.character(data2010$Date), "%m/%d/%y %H")
data2010$Date <- format(data2010$newdate, "%m/%d/%y")
data2010$newdate <- NULL
#merge
data2011 <- rbind(data2010, data2)
summary(data2011)
str(data2011)
#I see from the above commands that the files have merged
jan6Before <- subset(data2011, Date >= "12/22/10" & Date <= "01/06/11")
summary(jan6Before)
str(jan6Before)
#But this does not produce any observations
I suspect it's because your Date variable is a character, not date, being compared to another character constant i.e. "12/22/10".
I suggest you have a look at the package lubridate. You can then easily convert character (in this case month-date-year) to compare, e.g. mdy(Date) >= mdy("12/22/10") .
Merge on your variable newDate, and use that for subsetting also.

unexpected date format when creating a xts object with minute intervals in R

I'm retrieving one minute quotes from google. After processing the data I try to create an xts object with one minute intervals but get same datetime repeated several times but don't understand why. Note that if I use the same data to build a vector of timestamps called my.dat2it does work.
library(xts)
url <- 'https://www.google.com/finance/getprices?q=IBM&i=60&p=15d&f=d,o,h,l,c,v'
x <- read.table(url,stringsAsFactors = F)
mynam <- unlist(strsplit(unlist(strsplit(x[5,], split='=', fixed=TRUE))[2] , split=','))
interv <- as.numeric(unlist(strsplit(x[4,], split='=', fixed=TRUE))[2])
x2 <- do.call(rbind,strsplit(x[-(1:7),1],split=','))
rownames(x2) <- NULL
colnames(x2) <- mynam
ind <- which(nchar(x2[,1])>5)
x2[ind,1] <- unlist(strsplit(x2[ind,1], split='a', fixed=TRUE))[2]
#To convert from data.frame to numeric
class(x2) <- 'numeric'
my.dat <- rep(0,nrow(x2))
#Convert all to same format
for (i in 1:nrow(x2)) {
if (nchar(x2[i,1])>5) {
ini.dat <- x2[i,1]
my.dat[i] <- ini.dat
} else {
my.dat[i] <- ini.dat+interv*x2[i,1]
}
}
df <- xts(x2[,-1],as.POSIXlt(my.dat, origin = '1970-01-01'))
head(df,20)
my.dat2 <- as.POSIXlt(my.dat, origin = '1970-01-01')
head(my.dat2,20)
I tried a simpler example simulating the data and creating a sequence of dates by minute to create the xts object and it worked so it must be something that I'm missing when passing the dates to the xts function.
Your my.dat object has duplicated values and xts and zoo objects must be ordered, so all the duplicate values are being grouped together.
The issue is this line, where you only take the second element, rather than every non-blank element.
x2[ind,1] <- unlist(strsplit(x2[ind,1], split='a', fixed=TRUE))[2]
# this should be
x2[ind,1] <- sapply(strsplit(x2[ind,1], split='a', fixed=TRUE), "[[", 2)

yahoo tickers, time zone, and merging

I would like to download daily data from yahoo for the S&P 500, the DJIA, and 30-year T-Bonds, map the data to the proper time zone, and merge them with my own data. I have several questions.
My first problem is getting the tickers right. From yahoo's website, it looks like the tickers are: ^GSPC, ^DJI, and ^TYX. However, ^DJI fails. Any idea why?
My second problem is that I would like to constrain the time zone to GMT (I would like to ensure that all my data is on the same clock, GMT seems like a neutral choice), but I couldn' get it to work.
My third problem is that I would like to merge the yahoo data with my own data, obtained by other means and available in a different format. It is also daily data.
Here is my attempt at constraining the data to the GMT time zone. Executed at the top of my R script.
Sys.setenv(TZ = "GMT")
# > Sys.getenv("TZ")
# [1] "GMT"
# the TZ variable is properly set
# but does not affect the time zone in zoo objects, why?
Here is my code to get the yahoo data:
library("tseries")
library("xts")
date.start <- "1999-12-31"
date.end <- "2013-01-01"
# tickers <- c("GSPC","TYX","DJI")
# DJI Fails, why?
# http://finance.yahoo.com/q?s=%5EDJI
tickers <- c("GSPC","TYX") # proceed without DJI
z <- zoo()
index(z) <- as.Date(format(time(z)),tz="")
for ( i in 1:length(tickers) )
{
cat("Downloading ", i, " out of ", length(tickers) , "\n")
x <- try(get.hist.quote(
instrument = paste0("^",tickers[i])
, start = date.start
, end = date.end
, quote = "AdjClose"
, provider = "yahoo"
, origin = "1970-01-01"
, compression = "d"
, retclass = "zoo"
, quiet = FALSE )
, silent = FALSE )
print(x[1:4]) # check that it's not empty
colnames(x) <- tickers[i]
z <- try( merge(z,x), silent = TRUE )
}
Here is the dput(head(df)) of my dataset:
df <- structure(list(A = c(-0.011489000171423, -0.00020300000323914,
0.0430639982223511, 0.0201549995690584, 0.0372899994254112, -0.0183669999241829
), B = c(0.00110999995376915, -0.000153000000864267, 0.0497750006616116,
0.0337960012257099, 0.014121999964118, 0.0127800004556775), date = c(9861,
9862, 9863, 9866, 9867, 9868)), .Names = c("A", "B", "date"
), row.names = c("0001-01-01", "0002-01-01", "0003-01-01", "0004-01-01",
"0005-01-01", "0006-01-01"), class = "data.frame")
I'd like to merge the data in df with the data in z. I can't seem to get it to work.
I am new to R and very much open to your advice about efficiency, best practice, etc.. Thanks.
EDIT: SOLUTIONS
On the first problem: following GSee's suggestions, the Dow Jones Industrial Average data may be downloaded with the quantmod package: thus, instead of the "^DJI" ticker, which is no longer available from yahoo, use the "DJIA" ticker. Note that there is no caret in the "DJIA" ticker.
On the second problem, Joshua Ulrich points out in the comments that "Dates don't have timezones because days don't have a time component."
On the third problem: The data frame appears to have corrupted dates, as pointed out by agstudy in the comments.
My solutions rely on the quantmod package and the attached zoo/xts packages:
library(quantmod)
Here is the code I have used to get proper dates from my csv file:
toDate <- function(x){ as.Date(as.character(x), format("%Y%m%d")) }
dtz <- read.zoo("myData.csv"
, header = TRUE
, sep = ","
, FUN = toDate
)
dtx <- as.xts(dtz)
The dates in the csv file were stored in a single column in the format "19861231". The key to getting correct dates was to wrap the date in "as.character()". Part of this code was inspired from R - Stock market data from csv to xts. I also found the zoo/xts manuals helpful.
I then extract the date range from this dataset:
date.start <- start(dtx)
date.end <- end(dtx)
I will use those dates with quantmod's getSymbols function so that the other data I download will cover the same period.
Here is the code I have used to get all three tickers.
tickers <- c("^GSPC","^TYX","DJIA")
data <- new.env() # the data environment will store the data
do.call(cbind, lapply( tickers
, getSymbols
, from = date.start
, to = date.end
, env = data # data saved inside an environment
)
)
ls(data) # see what's inside the data environment
data$GSPC # access a particular ticker
Also note, as GSee pointed out in the comments, that the option auto.assign=FALSE cannot be used in conjunction with the option env=data (otherwise the download fails).
A big thank you for your help.
Yahoo doesn't provide historical data for ^DJI. Currently, it looks like you can get the same data by using the ticker "DJIA", but your mileage may vary.
It does work in this case because you're only dealing with Dates
the df object your provided is yearly data beginning in the year 0001. So, that's probably not what you wanted.
Here's how I would fetch and merge those series (or use an environment and only make one call to getSymbols)
library(quantmod)
do.call(cbind, lapply(c("^GSPC", "^TYX"), getSymbols, auto.assign=FALSE))

Resources