Create a trading day calendar from scratch - r

I just spent a day debugging some R code only to find that the problem I was having was caused by a missing date in the data returned by Yahoo using getSymbol. At the time I write this Yahoo is returning this:
QQQ.Open QQQ.High QQQ.Low QQQ.Close QQQ.Volume QQQ.Adjusted
2014-01-03 87.27 87.35 86.62 86.64 35723700 86.64
2014-01-06 86.66 86.76 86.00 86.32 32073100 86.32
2014-01-07 86.72 87.25 86.56 87.12 25860600 87.12
2014-01-08 87.14 87.55 86.95 87.31 27197400 87.31
2014-01-09 87.63 87.64 86.72 87.02 23674700 87.02
2014-01-13 87.18 87.48 85.68 86.01 48842300 86.01
2014-01-14 86.30 87.72 86.30 87.65 37178900 87.65
2014-01-15 88.03 88.54 87.94 88.37 39835600 88.37
2014-01-16 88.30 88.51 88.16 88.38 31630100 88.38
2014-01-17 88.11 88.37 87.67 87.88 36895800 87.88
which is missing 2014-01-10. That date is returned for other ETFs. I expect that Yahoo will fix the data one of these days (the data is on Google) but for now it is wrong which caused my code some fits.
To address this issue I want to check my data to ensure that there is data for all dates the markets were open. If there's a canned way to do this in some package I'd appreciate info on that but to that end I started writing some code using the timeDate package. However I have ended up with xts index questions I don't understand. The code follows:
library(timeDate)
library(quantmod)
MyZone = "UTC"
Sys.setenv(TZ = MyZone)
YearStart = "1990"
YearEnd = "2014"
currentYear = getRmetricsOptions("currentYear")
dateStart = paste0(YearStart, "-01-01")
dateEnd = paste0(YearEnd, "-12-31")
DayCal = timeSequence(from = dateStart, to = dateEnd, by="day", zone = MyZone)
TradingCal = DayCal[isBizday(DayCal, holidayNYSE())]
testSym = "QQQ"
getSymbols(testSym, src="yahoo", from = dateStart, to = dateEnd)
testData = get(testSym)
head(testData)
tail(testData, n=10)
#Save date range of data being checked
firstIndex = index(testData)[1]
lastIndex = index(testData)[nrow(testData)]
#Create an xts series covering all dates
AllDates = xts(x=rep(1, length.out=length(TradingCal)),
order.by=TradingCal, tzone = MyZone)
head(AllDates)
tail(AllDates)
index(AllDates)[1:20]
index(testData)[1:20]
tzone(AllDates)
tzone(testData)
#Create an xts object that has all dates covered
#by testSym but using calendar I created
CheckData = subset(AllDates, ((index(AllDates)>=firstIndex) &&
(index(AllDates)<=lastIndex))
)
class(index(AllDates))
class(index(testData))
The goal here was to create a 'known good calendar' which I could use to create a simple xts object. With that object I would then check whether every index in that object had a corresponding index in the data being tested. However I'm not getting that far as it appears my indexes are not compatible. When I run the code I get this at the end:
> CheckData = subset(AllDates, ((index(AllDates)>=firstIndex) && (index(AllDates)<=lastIndex))
+ )
Error in `>=.default`(index(AllDates), firstIndex) :
comparison (5) is possible only for atomic and list types
> class(index(AllDates))
[1] "timeDate"
attr(,"package")
[1] "timeDate"
> class(index(testData))
[1] "Date"
>
Can someone show me the errors of my ways here so that I can move forward? Thanks!

You need to convert TradingCal to Date:
TradingDates <- as.Date(TradingCal)
And here's another way to find index values in TradingDates that aren't in your testData index.
AllDates <- xts(,TradingDates)
testSubset <- paste(start(testData), end(testData), sep="/")
CheckData <- merge(AllDates, testData)[testSubset]
BadDates <- CheckData[is.na(rowSums(CheckData))]

Related

Error in if ((location <= 1) | (location >= length(x)) - R - Eventstudies

I am trying my best at a simple event study in R, with some data retrieved from the Wharton Research Data Service (WRDS). I am not completely new to R, but I would describe my expertise level as intermediate. So, here is the problem. I am using the eventstudies package and one of the steps is converting the physical dates to event time frame dates with the phys2eventtime(..) function. This function takes multiple arguments:
z : time series data for which event frame is to be generated. In the form of an xts object.
Events : it is a data frame with two columns: unit and when. unit has column name of which response is to measured on the event date, while when has the event date.
Width : width corresponds to the number of days on each side of the event date. For a given width, if there is any NA in the event window then the last observation is carried forward.
The authors of the package have provided an example for the xts object (StockPriceReturns) and for Events (SplitDates). This looks like the following:
> data(StockPriceReturns)
> data(SplitDates)
> head(SplitDates)
unit when
5 BHEL 2011-10-03
6 Bharti.Airtel 2009-07-24
8 Cipla 2004-05-11
9 Coal.India 2010-02-16
10 Dr.Reddy 2001-10-10
11 HDFC.Bank 2011-07-14
> head(StockPriceReturns)
Mahindra.&.Mahindra
2000-04-03 -8.3381609
2000-04-04 0.5923550
2000-04-05 6.8097616
2000-04-06 -0.9448889
2000-04-07 7.6843828
2000-04-10 4.1220462
2000-04-11 -1.9078480
2000-04-12 -8.3286900
2000-04-13 -3.8876847
2000-04-17 -8.2886060
So I have constructed my data in the same way, an xts object (DS_xts) and a data.frame (cDS) with the columns "unit" and "when". This is how it looks:
> head(DS_xts)
61241
2011-01-03 0.024247
2011-01-04 0.039307
2011-01-05 0.010589
2011-01-06 -0.022172
2011-01-07 0.018057
2011-01-10 0.041488
> head(cDS)
unit when
1 11754 2012-01-05
2 10104 2012-01-24
3 61241 2012-01-31
4 13928 2012-02-07
5 14656 2012-02-08
6 60097 2012-02-14
These are similar in my opinion, but how it looks does not tell the whole story. I am quite certain that my problem is in how I have constructed these two objects. Below is my R code:
#install.packages("eventstudies")
library("eventstudies")
DS = read.csv("ReturnData.csv")
cDS = read.csv("EventData.csv")
#Calculate Abnormal Returns
DS$AR = DS$RET - DS$VWRETD
#Clean up and let only necessary columns remain
DS = DS[, c("PERMNO", "DATE", "AR")]
cDS = cDS[, c("PERMNO", "DATE")]
#Generate correct date format according to R's as.Date
for (i in 1:nrow(DS)) {
DS$DATE[i] = format(as.Date(toString(DS$DATE[i]), format = "%Y %m %d"), format = "%Y-%m-%d")
}
for (i in 1:nrow(cDS)) {
cDS$DATE[i] = format(as.Date(toString(cDS$DATE[i]), format = "%Y %m %d"), format = "%Y-%m-%d")
}
#Rename cDS columns according to phys2eventtime format
colnames(cDS)[1] = "unit"
colnames(cDS)[2] = "when"
#Create list of unique PERMNO's
PERMNO <- unique(DS$PERMNO)
for (i in 1:length(PERMNO)) {
#Subset based on PERMNO
DStmp <- DS[DS$PERMNO == PERMNO[i], ]
#Remove PERMNO column and rename AR to PERMNO
DStmp <- DStmp[, c("DATE", "AR")]
colnames(DStmp)[2] = as.character(PERMNO[i])
dates <- as.Date(DStmp$DATE)
DStmp <- DStmp[, -c(1)]
#Create a temporary XTS object
DStmp_xts <- xts(DStmp, order.by = dates)
#If first iteration, just create new variable, otherwise merge
if (i == 1) {
DS_xts <- DStmp_xts
} else {
DS_xts <- merge(DS_xts, DStmp_xts, all = TRUE)
}
}
#Renaming columns for matching
colnames(DS_xts) <- c(PERMNO)
#Making sure classes are the same
cDS$unit <- as.character(cDS$unit)
eventList <- phys2eventtime(z = DS_xts, events = cDS, width = 10)
So, if I run phys2eventtime(..) it returns:
> eventList <- phys2eventtime(z = DS_xts, events = cDS, width = 10)
Error in if ((location <= 1) | (location >= length(x))) { :
missing value where TRUE/FALSE needed
In addition: Warning message:
In findInterval(when, index(x)) : NAs introduced by coercion
I have looked at the original function (it is available at their GitHub, can't use more than two links yet) to figure out this error, but I ran out of ideas how to debug it. I hope someone can help me sort it out. As a final note, I have also looked at another (magnificent) answer related to this R package (question: "format a zoo object with “dimnames”=List of 2"), but it wasn't enough to help me solve it (or I couldn't yet comprehend it).
Here is the link for the two CSV files if you would like to reproduce my error (or solve it!).

invalid 'tz' value, problems with time zone

I'm working with minute data of NASDAQ, it has the index "2015-07-13 12:05:00 EST". I adjusted the system time with Sys.setenv(TZ = 'EST').
I want to program a simple buy/hold/sell strategy, therefore I create a vector of flat positions as a foundation.
pos_flat <- xts(rep(0, nrow(NASDAQ)), index(NASDAQ))
Then I want to apply a constraint, that in a certain time window, positions are bound to be flat, which in my case means equal to 1.
pos_flat["T13:41/T14:00"] <- 1
And this returns the error:
"Error in as.POSIXlt.POSIXct(.POSIXct(.index(x)), tz = indexTZ(x)) :invalid 'tz' value".
I also get this error doing other calculations, I just used this example because it is easy and shows the problem.
As extra information:
> Sys.timezone
function (location = TRUE)
{
tz <- Sys.getenv("TZ", names = FALSE)
if (nzchar(tz))
return(tz)
if (location)
return(.Internal(tzone_name()))
z <- as.POSIXlt(Sys.time())
zz <- attr(z, "tzone")
if (length(zz) == 3L)
zz[2L + z$isdst]
else zz[1L]
}
<bytecode: 0x03648ff4>
<environment: namespace:base>
I don't understand the problem with the tz value... Any ideas?
The source of your "invalid 'tz' value" error is because, for whatever reason, R doesn't accept tz = df$var. If you set tz = 'America/New_York' or some other character value, then it will work.
Better answer (instead of using force_tz below) for converting UTC times to various timezones based on location. It is also simpler and better than looping through or using a nested ifelse. I subset and change tz based on a timezone column (which my data already has, if not you can create it). Just make sure you account for all timezones in your data
(unique(df$timezone))
df$datetime2[df$timezone == 'America/New_York'] <- format(df$datetime, tz="America/New_York")[df$timezone == 'America/New_York']
df$datetime2[df$timezone == 'America/Chicago'] <- format(df$datetime, tz="America/Chicago")[df$timezone == 'America/Chicago']
df$datetime2[df$timezone == 'America/Denver'] <- format(df$datetime, tz="America/Denver")[df$timezone == 'America/Denver']
df$datetime2[df$timezone == 'America/Los_Angeles'] <- format(df$datetime, tz="America/Los_Angeles")[df$timezone == 'America/Los_Angeles']
Previous solution: Converting to Local Time in R - Vector of Timezones
require(lubridate)
require(dplyr)
df = data.frame(timestring = c("2015-12-12 13:34:56", "2015-12-14 16:23:32"), localzone = c("America/Los_Angeles", "America/New_York"), stringsAsFactors = F)
df$moment = as.POSIXct(df$timestring, format="%Y-%m-%d %H:%M:%S", tz="UTC")
df = df %>% rowwise() %>% mutate(localtime = force_tz(moment, localzone))
df
You are getting errors because "EST" is not a valid timezone specification. It's an abbreviation that's often used when printing and displaying timezones.
The index is printed as "2015-07-13 12:05:00 EST" because "EST" probably represents Eastern Standard Time in the United States. If you want to set the TZ environment variable to that timezone, you should use Sys.setenv() with Country/City notation:
Sys.setenv(TZ = "America/New_York")
You can also set the timezone in the xts constructor:
pos_flat <- xts(rep(0, nrow(NASDAQ)), index(NASDAQ), tzone = "America/New_York")
Your error occurs because of a misinterpretation of the time object. You need to have UNIX timestamps in order to use something like
pos_flat["T13:41/T14:00"] <- 1
Try a conversion of your indices by doing something like this:
index(NASDAQ) <- as.POSIXct(strptime(index(NASDAQ), "%Y-%m-%d %H:%M:%S"))
As you want to use EST, you have to change your environment variables (if you are not living in EST timezone). So all in all, this should work:
Sys.setenv(TZ = 'EST')
#load stuff
#...
index(NASDAQ) <- as.POSIXct(strptime(index(NASDAQ), "%Y-%m-%d %H:%M:%S"))
pos_flat <- xts(rep(0, nrow(NASDAQ)), index(NASDAQ))
pos_flat["T13:41/T14:00"] <- 1
For further information, have a look at the POSIXct and POSIXlt structures in R.
Best regards

quantmod : can't generate daily returns for stock using OHLC

I'm attempting to get daily returns by using one BDH pull, but I can't seem to get it to work. I considered using quantmod's periodreturn function, but to no avail. I'd like the PctChg column populated, and any help is greatly appreciated.
GetReturns <- function(ticker, calctype, voldays) {
check.numeric <- function(N){
!length(grep("[^[:digit:]]", as.character(N)))}
isnumber <- function(x) is.numeric(x) & !is.na(x)
startdate <- Sys.Date()-20
enddate <- Sys.Date()
###############
GetData <- BBGPull <- bdh(paste(ticker," US EQUITY"), c("Open","High","Low","PX_Last"), startdate, enddate,
include.non.trading.days = FALSE, options = NULL, overrides = NULL,
verbose = FALSE, identity = NULL, con = defaultConnection())
##Clean Up Columns and Remove Ticker
colnames(GetData) <- c("Date","Open","High","Low","Close")
GetData[,"PctChg"] <- "RETURN" ##Hoping to populate this column with returns
GetData
}
I'm not married to the idea of using quantmod, and even would use LN(T/T-1) but im just unsure how to add a column with this data. Thank you !
You missed the (important) fact that bdh() still returns a data.frame object you need to transform first:
R> library(Rblpapi)
Rblpapi version 0.3.5 using Blpapi headers 3.8.8.1 and run-time 3.8.8.1.
Please respect the Bloomberg licensing agreement and terms of service.
R> spy <- bdh("SPY US EQUITY", c("Open","High","Low","PX_Last"), \
+ Sys.Date()-10, Sys.Date())
R> class(spy)
[1] "data.frame"
R> head(spy)
date Open High Low PX_Last
1 2016-12-05 220.65 221.400 220.420 221.00
2 2016-12-06 221.22 221.744 220.662 221.70
3 2016-12-07 221.52 224.670 221.380 224.60
4 2016-12-08 224.57 225.700 224.260 225.15
5 2016-12-09 225.41 226.530 225.370 226.51
6 2016-12-12 226.40 226.960 225.760 226.25
R> sx <- xts(spy[, -1], order.by=spy[,1])
R> colnames(sx)[4] <- "Close" ## important
R> sxret <- diff(log(Cl(sx)))
R> head(sxret)
Close
2016-12-05 NA
2016-12-06 0.00316242
2016-12-07 0.01299593
2016-12-08 0.00244580
2016-12-09 0.00602225
2016-12-12 -0.00114851
R> sxret <- ClCl(sx) ## equivalent shorthand using quantmod
This also uses packages xts and quantmod without explicitly loading them.

R bizdays trouble making it work

Im tring to use the bizdays package to generate a vector with bus days between two dates.
fer = as.data.frame(as.Date(fer[1:938]))
#Define default calendar
bizdays.options$set(default.calendar=fer)
dt1 = as.Date(Sys.Date())
dt2 = as.Date(Sys.Date()-(365*10)) #sample 10 year window
#Create date vector
datas = bizseq(dt2, dt1)
i get this error: "Error in bizseq.Date(dt2, dt1) : Given date out of range."
the same behavior for any function bizdays et al.
any ideas?
I had a similar problem, but could not apply the accepted answer to my case. What worked for me was to make sure that the first and last holiday in the vector holidays at least covers (or exceeds) the range of dates provided to bizdays():
library(bizdays)
This works (from_date and to_date both lie within the first and last holiday provided by holidays):
holidays <- c("2016-08-10", "2016-08-13")
from_date <- "2016-08-11"
to_date <- "2016-08-12"
cal <- Calendar(holidays, weekdays=c('sunday', 'saturday'))
bizdays(from_date, to_date, cal)
#1
This does not work (to_date lies outside of the last holiday of holidays):
holidays <- c("2016-08-10", "2016-08-11")
from_date <- "2016-08-11"
to_date <- "2016-08-12"
cal <- Calendar(holidays, weekdays=c('sunday', 'saturday'))
bizdays(from_date, to_date, cal)
# Error in bizdays.Date(from, to, cal) : Given date out of range.
If fer is the holidays, you can try with:
bizdays.options$set(default.calendar=Calendar(holidays=fer))

'by' seems to not retain attribute of 'date' type columns in data.table, possible bug

I have data.table named data like this:
> head(data)
start end unit
1: 2008-11-17 2007-01-23 ADM 2-05
2: 2008-12-29 2007-01-06 BOB 4-07
3: 2008-12-31 2007-01-01 DAT15-02
4: 2008-12-31 2010-01-01 DAT15-06
5: 2008-12-31 2010-01-02 TUW 4-09
6: 2008-12-31 2010-01-02 BEG 5-01
With data types as follows:
sapply(dane, class)
start end unit
"Date" "Date" "character"
I'm trying to debug this line:
data[,
list(date = format(seq(from = start, to = end, by = "1 day"), "%Y-%m-%d")),
by = list(start, end, unit)
]
Then I get error message:
Error in del/by : non-numeric argument to binary operator
I figured out, that the error is caused by conversion to numeric which takes place when I pass something as argument to the list in 'by'.
So this modified code works:
dane[,
list(date = format(seq(
from = as.Date(start, origin = "1970-01-01"),
to = as.Date(end, origin = "1970-01-01"), by = "1 day"),
"%Y-%m-%d")),
by = list(start, end, unit)
]
This looks like a bug in data.table package. I wonder if anybody knows about this one.
Thanks in advance.
This is now fixed in commit #1256 of v1.9.3. From NEWS:
o Using by columns with attributes (ex: factor, Date) in j did not retain the attributes, also in case of :=. This was partially a regression from an earlier fix (bug #2531) due to recent changes for R3.1.0. Now fixed and clearer tests added. Thanks to Christophe Dervieux for reporting and to AdamB for reporting here on SO. Closes #5437.
Thanks again for reporting.

Resources