Run analysis only on a intraday time period - zoo - r

I need to run an analysis from 10AM to 4PM.
The original data runs from 9 AM to 5 PM, everyday for one year.
How to include only the indicated time period for analysis ?
window in zoo does not help for the same.
structure(c(0, 7.12149266486255e-05, 0.000142429853297251, 0.000213644779945877,
0.000284859706594502, 0.000356074633243128, 0.000427289559891753,
0.000498504486540379, 0.000569719413189004, 0.00064093433983763,
0.000712149266486256, 0.000783364193134881, 0.000854579119783507,
0.000925794046432132, 0.000997008973080758, 0.00106822389972938,
0.00113943882637801, 0.00121065375302663, 0.00128186867967526,
0.00135308360632389, 0.00142429853297251, 0.00149551345962114,
0.00156672838626976, 0.00163794331291839, 0.00170915823956701,
0.00178037316621564, 0.00185158809286426, 0.00192280301951289,
0.00199401794616152, 0.00206523287281014), index = structure(c(1009942620,
1009942680, 1009942740, 1009942800, 1009942860, 1009942920, 1009942980,
1009943040, 1009943100, 1009943160, 1009943220, 1009943280, 1009943340,
1009943400, 1009943460, 1009943520, 1009943580, 1009943640, 1009943700,
1009943760, 1009943820, 1009943880, 1009943940, 1009944000, 1009944060,
1009944120, 1009944180, 1009944240, 1009944300, 1009944360), class = c("POSIXct",
"POSIXt")), class = "zoo")
How to select periods of time > 10 AM and time < 4 PM, across several days.

If z is the zoo object then
1) use this to extract hour of each time point and then subset to only those that are 10, 11, 12, 13, 14 or 15.
z[format(time(z), "%H") %in% 10:15]
2) or use this alternative which is similar but uses POSIXlt to get the hour:
z[as.POSIXlt(time(z))$hour %in% 10:15]
3) or convert the series to xts and use this:
x <- as.xts(z)["T10:00/T15:00"]
drop(as.zoo(x))
Omit the second line if it is ok to return an xts series.
Time Zone
Be sure that you have the time zone set correctly since the time in one time zone is not the same as in another time zone.
We can query the current time zone of the session like this:
Sys.timezone()
and can set it like this:
Sys.setenv(TZ = "...")
where ... is replaced with the time zone desired. Common settings are:
Sys.setenv(TZ = "GMT")
Sys.setenv(TZ = "") # restore default
The following will show the possible time zones that can be used:
OlsonNames()
You only need all this if the time zone of your session is not already set to the time zone of the data.

You could build a tibble for analysis with time, value and hour information. You can then filter the rows only between 10AM to 4PM.
library(dplyr)
library(zoo)
tibble(time = index(df),
value = coredata(df),
hour = lubridate::hour(time)) %>%
filter(between(hour, 10, 15)) -> result
result

Related

invalid 'tz' value, problems with time zone

I'm working with minute data of NASDAQ, it has the index "2015-07-13 12:05:00 EST". I adjusted the system time with Sys.setenv(TZ = 'EST').
I want to program a simple buy/hold/sell strategy, therefore I create a vector of flat positions as a foundation.
pos_flat <- xts(rep(0, nrow(NASDAQ)), index(NASDAQ))
Then I want to apply a constraint, that in a certain time window, positions are bound to be flat, which in my case means equal to 1.
pos_flat["T13:41/T14:00"] <- 1
And this returns the error:
"Error in as.POSIXlt.POSIXct(.POSIXct(.index(x)), tz = indexTZ(x)) :invalid 'tz' value".
I also get this error doing other calculations, I just used this example because it is easy and shows the problem.
As extra information:
> Sys.timezone
function (location = TRUE)
{
tz <- Sys.getenv("TZ", names = FALSE)
if (nzchar(tz))
return(tz)
if (location)
return(.Internal(tzone_name()))
z <- as.POSIXlt(Sys.time())
zz <- attr(z, "tzone")
if (length(zz) == 3L)
zz[2L + z$isdst]
else zz[1L]
}
<bytecode: 0x03648ff4>
<environment: namespace:base>
I don't understand the problem with the tz value... Any ideas?
The source of your "invalid 'tz' value" error is because, for whatever reason, R doesn't accept tz = df$var. If you set tz = 'America/New_York' or some other character value, then it will work.
Better answer (instead of using force_tz below) for converting UTC times to various timezones based on location. It is also simpler and better than looping through or using a nested ifelse. I subset and change tz based on a timezone column (which my data already has, if not you can create it). Just make sure you account for all timezones in your data
(unique(df$timezone))
df$datetime2[df$timezone == 'America/New_York'] <- format(df$datetime, tz="America/New_York")[df$timezone == 'America/New_York']
df$datetime2[df$timezone == 'America/Chicago'] <- format(df$datetime, tz="America/Chicago")[df$timezone == 'America/Chicago']
df$datetime2[df$timezone == 'America/Denver'] <- format(df$datetime, tz="America/Denver")[df$timezone == 'America/Denver']
df$datetime2[df$timezone == 'America/Los_Angeles'] <- format(df$datetime, tz="America/Los_Angeles")[df$timezone == 'America/Los_Angeles']
Previous solution: Converting to Local Time in R - Vector of Timezones
require(lubridate)
require(dplyr)
df = data.frame(timestring = c("2015-12-12 13:34:56", "2015-12-14 16:23:32"), localzone = c("America/Los_Angeles", "America/New_York"), stringsAsFactors = F)
df$moment = as.POSIXct(df$timestring, format="%Y-%m-%d %H:%M:%S", tz="UTC")
df = df %>% rowwise() %>% mutate(localtime = force_tz(moment, localzone))
df
You are getting errors because "EST" is not a valid timezone specification. It's an abbreviation that's often used when printing and displaying timezones.
The index is printed as "2015-07-13 12:05:00 EST" because "EST" probably represents Eastern Standard Time in the United States. If you want to set the TZ environment variable to that timezone, you should use Sys.setenv() with Country/City notation:
Sys.setenv(TZ = "America/New_York")
You can also set the timezone in the xts constructor:
pos_flat <- xts(rep(0, nrow(NASDAQ)), index(NASDAQ), tzone = "America/New_York")
Your error occurs because of a misinterpretation of the time object. You need to have UNIX timestamps in order to use something like
pos_flat["T13:41/T14:00"] <- 1
Try a conversion of your indices by doing something like this:
index(NASDAQ) <- as.POSIXct(strptime(index(NASDAQ), "%Y-%m-%d %H:%M:%S"))
As you want to use EST, you have to change your environment variables (if you are not living in EST timezone). So all in all, this should work:
Sys.setenv(TZ = 'EST')
#load stuff
#...
index(NASDAQ) <- as.POSIXct(strptime(index(NASDAQ), "%Y-%m-%d %H:%M:%S"))
pos_flat <- xts(rep(0, nrow(NASDAQ)), index(NASDAQ))
pos_flat["T13:41/T14:00"] <- 1
For further information, have a look at the POSIXct and POSIXlt structures in R.
Best regards

R bizdays trouble making it work

Im tring to use the bizdays package to generate a vector with bus days between two dates.
fer = as.data.frame(as.Date(fer[1:938]))
#Define default calendar
bizdays.options$set(default.calendar=fer)
dt1 = as.Date(Sys.Date())
dt2 = as.Date(Sys.Date()-(365*10)) #sample 10 year window
#Create date vector
datas = bizseq(dt2, dt1)
i get this error: "Error in bizseq.Date(dt2, dt1) : Given date out of range."
the same behavior for any function bizdays et al.
any ideas?
I had a similar problem, but could not apply the accepted answer to my case. What worked for me was to make sure that the first and last holiday in the vector holidays at least covers (or exceeds) the range of dates provided to bizdays():
library(bizdays)
This works (from_date and to_date both lie within the first and last holiday provided by holidays):
holidays <- c("2016-08-10", "2016-08-13")
from_date <- "2016-08-11"
to_date <- "2016-08-12"
cal <- Calendar(holidays, weekdays=c('sunday', 'saturday'))
bizdays(from_date, to_date, cal)
#1
This does not work (to_date lies outside of the last holiday of holidays):
holidays <- c("2016-08-10", "2016-08-11")
from_date <- "2016-08-11"
to_date <- "2016-08-12"
cal <- Calendar(holidays, weekdays=c('sunday', 'saturday'))
bizdays(from_date, to_date, cal)
# Error in bizdays.Date(from, to, cal) : Given date out of range.
If fer is the holidays, you can try with:
bizdays.options$set(default.calendar=Calendar(holidays=fer))

Create a trading day calendar from scratch

I just spent a day debugging some R code only to find that the problem I was having was caused by a missing date in the data returned by Yahoo using getSymbol. At the time I write this Yahoo is returning this:
QQQ.Open QQQ.High QQQ.Low QQQ.Close QQQ.Volume QQQ.Adjusted
2014-01-03 87.27 87.35 86.62 86.64 35723700 86.64
2014-01-06 86.66 86.76 86.00 86.32 32073100 86.32
2014-01-07 86.72 87.25 86.56 87.12 25860600 87.12
2014-01-08 87.14 87.55 86.95 87.31 27197400 87.31
2014-01-09 87.63 87.64 86.72 87.02 23674700 87.02
2014-01-13 87.18 87.48 85.68 86.01 48842300 86.01
2014-01-14 86.30 87.72 86.30 87.65 37178900 87.65
2014-01-15 88.03 88.54 87.94 88.37 39835600 88.37
2014-01-16 88.30 88.51 88.16 88.38 31630100 88.38
2014-01-17 88.11 88.37 87.67 87.88 36895800 87.88
which is missing 2014-01-10. That date is returned for other ETFs. I expect that Yahoo will fix the data one of these days (the data is on Google) but for now it is wrong which caused my code some fits.
To address this issue I want to check my data to ensure that there is data for all dates the markets were open. If there's a canned way to do this in some package I'd appreciate info on that but to that end I started writing some code using the timeDate package. However I have ended up with xts index questions I don't understand. The code follows:
library(timeDate)
library(quantmod)
MyZone = "UTC"
Sys.setenv(TZ = MyZone)
YearStart = "1990"
YearEnd = "2014"
currentYear = getRmetricsOptions("currentYear")
dateStart = paste0(YearStart, "-01-01")
dateEnd = paste0(YearEnd, "-12-31")
DayCal = timeSequence(from = dateStart, to = dateEnd, by="day", zone = MyZone)
TradingCal = DayCal[isBizday(DayCal, holidayNYSE())]
testSym = "QQQ"
getSymbols(testSym, src="yahoo", from = dateStart, to = dateEnd)
testData = get(testSym)
head(testData)
tail(testData, n=10)
#Save date range of data being checked
firstIndex = index(testData)[1]
lastIndex = index(testData)[nrow(testData)]
#Create an xts series covering all dates
AllDates = xts(x=rep(1, length.out=length(TradingCal)),
order.by=TradingCal, tzone = MyZone)
head(AllDates)
tail(AllDates)
index(AllDates)[1:20]
index(testData)[1:20]
tzone(AllDates)
tzone(testData)
#Create an xts object that has all dates covered
#by testSym but using calendar I created
CheckData = subset(AllDates, ((index(AllDates)>=firstIndex) &&
(index(AllDates)<=lastIndex))
)
class(index(AllDates))
class(index(testData))
The goal here was to create a 'known good calendar' which I could use to create a simple xts object. With that object I would then check whether every index in that object had a corresponding index in the data being tested. However I'm not getting that far as it appears my indexes are not compatible. When I run the code I get this at the end:
> CheckData = subset(AllDates, ((index(AllDates)>=firstIndex) && (index(AllDates)<=lastIndex))
+ )
Error in `>=.default`(index(AllDates), firstIndex) :
comparison (5) is possible only for atomic and list types
> class(index(AllDates))
[1] "timeDate"
attr(,"package")
[1] "timeDate"
> class(index(testData))
[1] "Date"
>
Can someone show me the errors of my ways here so that I can move forward? Thanks!
You need to convert TradingCal to Date:
TradingDates <- as.Date(TradingCal)
And here's another way to find index values in TradingDates that aren't in your testData index.
AllDates <- xts(,TradingDates)
testSubset <- paste(start(testData), end(testData), sep="/")
CheckData <- merge(AllDates, testData)[testSubset]
BadDates <- CheckData[is.na(rowSums(CheckData))]

Error running bfast function

This is probably obvious but I cannot see where the problem is. I’m trying to run bfast on a yearly time series to detect abrupt changes in trend but keep getting the following error (it is indeed the call to stl what is causing the problem):
bfast(GM85.ts,h=0.15,max.iter=1,season="none")
Error in stl(Yt, "periodic") : series is not periodic or has less than two periods
My time series has frequency 1 and 95 years of data
GM85.ts
Time Series:
Start = 2006
End = 2100
Frequency = 1
[1] 13.88868 13.89915 13.91431 13.93718 13.94067 13.94063 13.96324 13.99648 14.01391 14.03268 14.04667 14.05893 14.05230 14.06443 14.07909 14.11433 14.14736 14.14514 14.15454 14.19593 14.23417 14.23578 14.25171 14.27545 14.27213
[26] 14.29543 14.32851 14.34124 14.36091 14.38245 14.41517 14.42666 14.45183 14.49599 14.50378 14.52052 14.54298 14.58360 14.60798 14.62069 14.64962 14.68641 14.71247 14.72497 14.76606 14.79369 14.81297 14.84822 14.86503 14.89134
[51] 14.92601 14.95497 14.98318 15.01789 15.05929 15.09193 15.11453 15.14574 15.17960 15.20188 15.23737 15.27275 15.28612 15.32248 15.34883 15.38858 15.42155 15.45223 15.48342 15.51099 15.54076 15.58005 15.59959 15.63353 15.66272
[76] 15.69312 15.71358 15.73641 15.76502 15.79923 15.83983 15.87472 15.91833 15.93602 15.99177 16.03119 16.05529 16.07834 16.10982 16.14174 16.17376 16.22898 16.25100 16.27703 16.30971
Therefore it is periodic and has more than two periods… what is causing the error then?
Seasonality only occurs when there is more than one observation per year. bfast can't find a harmonic equation given a single value.
You need to specify frequency when you create the time series object using ts(). Example of data with six measurements per year:
mydata <- rep(c(1,4,6,8,5,2), 50) + rnorm(50*6)
plot(mydata, type = "l")
ts <- ts(mydata, frequency = 6, start = 1969)
Check by typing ts, that will give you:
Time Series:
Start = c(1969, 1)
End = c(2018, 6)
Frequency = 6

Averaging multiple time series contained in a list object month-by-month

I have a list object with multiple slots, 2 in this example, each containing a time series of monthly data over 5 yrs (60 values). I want to calculate an "average" time series from these two i.e the January value of my new series should be the mean of the 2 Januarys from each of the two time series and so on. I thought of using lapply() but if I understand correctly, that is used to apply functions within slots and not across, but I could be wrong.
Here is a dput()of my data:
list(structure(c(-2.70881589936525, -1.25455287657218, 2.20891093254408,
5.47494447650721, 9.22974987813887, 12.0978184689361, 15.8529078203063,
14.5682520133134, 10.8615272853853, 5.13086415717895, 0.728917940858284,
2.13993708024285, 0.0592607633855364, -1.08188244487586, -1.19467731719249,
5.03740002827978, 10.3763483415682, 13.3292119845773, 12.838352493412,
15.3580851547661, 9.4829099091539, 6.56223017400025, 1.36042454896383,
0.899805834524198, -2.13189083053455, -0.083918862391372, 0.994166453698637,
2.71436535566226, 11.3453352141603, 15.0712013841955, 13.7110193822507,
9.8693411661721, 9.60321957581941, 5.2375499185438, -0.184162586424226,
-1.50175258729513, -6.9445058128996, -3.21184575341925, 0.383804323362742,
5.59544079002557, 7.80248270514967, 12.4958346580684, 14.3387761065989,
12.1472112399243, 12.3920738957853, 7.03456285321734, 1.04268672395181,
-1.38758815045495, -3.32477056135693, -0.447356879470411, 4.56295165574707,
5.68189626665318, 6.74697976141299, 12.0703824641417, 16.8904454284777,
14.2920547883889, 12.1655598473256, 6.77734909883441, 3.00180135903895,
1.94856648801937), .Tsp = c(2001, 2005.91666666667, 12), class = "ts"),
structure(c(-1.63889806183691, -3.44715647302858, 0.394739200062096,
5.23920359311239, 9.57664849661865, 14.0415975194851, 16.7884967982053,
13.6157495618028, 10.5269221330342, 7.71132825720641, -0.0288215700483627,
-3.13091409964762, -0.970448606448803, -1.87539827694689,
0.765137214031195, 4.44395722618218, 10.680721392289, 10.3468681880514,
14.3053058161559, 16.3132350056912, 12.8839577165089, 9.98091681764607,
2.69020486688223, 0.290392068555248, -0.924761078500446,
-5.67205756065117, 1.41326224137296, 6.36338872204811, 8.92258840663339,
13.0624643120579, 12.8689225828453, 14.3836922928304, 12.3805992934003,
7.60234172866889, 2.86744304241512, 1.35829952041713, -2.82629733903844,
-0.768552317763034, -0.568688011194226, 3.57676644057355,
4.99664005346314, 11.0140757656585, 15.498475017639, 13.4278279144656,
11.8598222456967, 7.31027938974563, 3.10247804880477, -2.67278197280972,
-2.49516231672057, -3.63941768231319, 1.89945951183736, 4.26424942213747,
9.37058647983393, 14.5133688239731, 14.6719630140624, 15.5022840542944,
13.3686764903323, 6.20332371420166, 3.05229549361941, -0.975912393030021
), .Tsp = c(2001, 2005.91666666667, 12), class = "ts"))
If there is an automated way of doing this it will be great because I will eventually have a list with 1000 ts() objects each with 600 data points
Thanks.
If L is the list of "ts" objects then assuming the time index of each component series is the same:
1) rowMeans/cbind
combined <- ts(rowMeans(do.call("cbind", L)))
tsp(combined) <- tsp(L[[1]]) # fix up times
2) Reduce
Reduce("+", L) / length(L)
These should both work even if there are more than 2 components in L.

Resources