Intraday high/low clustering - r

I am attempting to perform a study on the clustering of high/low points based on time. I managed to achieve the above by using to.daily on intraday data and merging the two using:
intraday.merge <- merge(intraday,daily)
intraday.merge <- na.locf(intraday.merge)
intraday.merge <- intraday.merge["T08:30:00/T16:30:00"] # remove record at 00:00:00
Next, I tried to obtain the records where the high == daily.high/low == daily.low using:
intradayhi <- test[test$High == test$Daily.High]
intradaylo <- test[test$Low == test$Daily.Low]
Resulting data resembles the following:
Open High Low Close Volume Daily.Open Daily.High Daily.Low Daily.Close Daily.Volume
2012-06-19 08:45:00 258.9 259.1 258.5 258.7 1424 258.9 259.1 257.7 258.7 31523
2012-06-20 13:30:00 260.8 260.9 260.6 260.6 1616 260.4 260.9 259.2 260.8 35358
2012-06-21 08:40:00 260.7 260.8 260.4 260.5 493 260.7 260.8 257.4 258.3 31360
2012-06-22 12:10:00 255.9 256.2 255.9 256.1 626 254.5 256.2 253.9 255.3 50515
2012-06-22 12:15:00 256.1 256.2 255.9 255.9 779 254.5 256.2 253.9 255.3 50515
2012-06-25 11:55:00 254.5 254.7 254.4 254.6 1589 253.8 254.7 251.5 253.9 65621
2012-06-26 08:45:00 253.4 254.2 253.2 253.7 5849 253.8 254.2 252.4 253.1 70635
2012-06-27 11:25:00 255.6 256.0 255.5 255.9 973 251.8 256.0 251.8 255.2 53335
2012-06-28 09:00:00 257.0 257.3 256.9 257.1 601 255.3 257.3 255.0 255.1 23978
2012-06-29 13:45:00 253.0 253.4 253.0 253.4 451 247.3 253.4 246.9 253.4 52539
There are duplicated results using the subset, how do I achieve only the first record of the day? I would then be able to plot the count of records for periods in the day.
Also, are there alternate methods to get the results I want? Thanks in advance.
Edit:
Sample output should look like this, count could either be 1st result for day or aggregated (more than 1 occurrence in that day):
Time Count
08:40:00 60
08:45:00 54
08:50:00 60
...
14:00:00 20
14:05:00 12
14:10:00 30

You can get the first observation of each day via:
y <- apply.daily(x, first)
Then you can simply aggregate the count based on hours and minutes:
z <- aggregate(1:NROW(y), by=list(Time=format(index(y),"%H:%M")), sum)

Related

r - Plotting monthly time series data in R - cannot plot more than 10 series

I'm having a lot of trouble plotting my time series data in R Studio. My data is laid out as follows:
tsf
Time Series:
Start = 1995
End = 2021
Frequency = 1
Jan Feb Mar Apr May Jun July Aug Sep Oct Nov Dec
1995 10817 8916 9697 10314 9775 7125 9007 6000 4155 3692 2236 996
1996 12773 12562 13479 14280 13839 9168 10959 6582 5162 4815 3768 1946
1997 14691 12982 13545 14131 14162 10415 11420 7870 6340 6869 6777 6637
1998 17192 15480 14703 16903 15921 13381 13779 9127 6676 6511 5419 3447
1999 13578 19470 23411 18190 18979 17296 16588 12561 10405 8537 7304 4003
2000 20100 29419 30125 27147 27832 23874 19728 15847 11477 9301 6933 3486
2001 16528 22258 22146 19027 19436 15688 14558 10609 6799 6563 4816 2480
2002 14724 19424 21391 17215 18775 13017 14385 10044 7649 6598 4497 2766
2003 17051 20182 18564 18484 15365 12180 13313 8859 6830 6371 3781 2012
2004 16875 20084 21150 19057 16153 13619 14144 9599 7390 5830 3763 2033
2005 20002 24153 23160 20864 18331 14950 14149 11086 7475 6290 3779 2134
2006 24605 26384 24858 20634 18951 15048 14905 10749 7259 5479 3074 1509
2007 29281 26495 25974 21427 20232 15465 15738 10006 6674 5301 2857 1304
2008 32961 24290 20190 17587 12172 7369 16175 6822 4364 2699 1174 667
2009 10996 8793 7345 5558 4840 4833 4355 2422 2272 1596 948 474
2010 10469 11707 12379 9599 8893 8314 7018 5310 4683 3742 2146 647
2011 13624 13470 12390 11171 9359 9240 6953 3653 2861 2216 1398 597
2012 14507 10993 10581 9388 7986 5481 6164 3736 2783 2442 1421 774
2013 10735 9671 10596 8113 7095 3293 9306 4504 3257 2832 1307 639
2014 15975 11906 11485 11757 7767 3390 14037 6201 4376 3082 1465 920
2015 20105 15384 17054 13166 9027 3924 21290 8572 5924 3943 1874 847
2016 27106 21173 20096 14847 10125 4143 22462 9781 5842 3831 1846 679
2017 26668 16905 17180 13427 9581 3585 21316 8105 4828 3255 1594 601
2018 25813 16501 16088 11557 9362 3716 20743 7681 4397 2874 1647 778
2019 22279 14178 14404 13794 9126 3858 18741 7202 4104 3214 1676 729
2020 20665 13263 10239 1338 1490 2189 15329 7360 5747 4189 1468 1032
2021 16948 11672 10672 8214 7337 4980 20232 8563 6354 3882 2167 832
When I attempt rudimentary code to plot the data I get the following
plot(tsf)
'Error in plotts(x = x, y = y, plot.type = plot.type, xy.labels = xy.labels, :
cannot plot more than 10 series as "multiple"'
My data is monthly and therefore 12 months exceed this apparent limit of 10 graphs.I've been able to make some plot by excluding two months but this is not practical for me.
I've looked at lots of answers on this, many of which recommending ggplot() {ggplot2}
The link below had data most closely resembling my data but I still wasn't able to apply it.
issues plotting multivariate time series in R
Any help greatly appreciated.
I think the problem is with the shape of your data. It's indicating Frequency = 1, showing that it thinks the monthly columns are separate yearly time series, rather than a continuous time series across months. To plot the whole time length you can reshape your time series to match monthly frequency (from a simulated dataset of values):
tsf_switched <- ts(as.vector(t(tsf)), start = 1995, frequency = 12)
plot(tsf)
Created on 2022-05-07 by the reprex package (v2.0.1)
one solution with {ggplot2} and two convenience libraries:
library(dplyr)
library(tsbox) ## for convenient ts -> dataframe conversion
library(lubridate) ## time and date manipulation
## example monthly data for twelve years:
example_ts <- ts(runif(144), start = 2010, end = 2021, frequency = 12)
ts_data.frame(example_ts) %>% ## {tsbox}
mutate(year = year(time),
day_month = paste0(day(time),'/', month(time))
) %>%
ggplot() +
geom_line(aes(day_month,
value,
group = year
)
)
ways to convert time series to dataframes (required as ggplot input): Transforming a time-series into a data frame and back

Converting 1 minute data to 5 minutes using R

I am trying to convert 1 minute data for 5 minutes in R using PROXISct. But I am unable to convert it.
My data is in this format.
Date Time Price Volume No.of.trades
1 01-06-2012 09:15 4901.895 283550 1286
2 01-06-2012 09:16 4907.046 140000 831
3 01-06-2012 09:17 4904.140 96900 639
4 01-06-2012 09:18 4900.609 84350 553
5 01-06-2012 09:19 4900.067 76450 516
6 01-06-2012 09:20 4898.378 84900 551
dt_tm <- as.POSIXct(paste(x[,1], x[,2]),
format="%d-%m-%Y %H:%M", tz="UTC")
cable <- xts(x[,3:5], order.by=dt_tm)
Price Volume No.of.trades
2012-01-07 09:15:00 6054.890 139750 787
2012-01-07 09:16:00 6051.176 56550 335
2012-01-07 09:17:00 6045.232 127400 691
2012-01-07 09:18:00 6039.950 59950 374
2012-01-07 09:19:00 6042.292 55450 214
2012-01-07 09:20:00 6044.140 53600 246
After this step I am getting a different type series, which is not there in my data.
Further, I have to use this code to convert my data to 5 minutes,
colnames(cable)[1] <- "CLOSE"
trades5 <-to.minutes5(cable, indexAt='startof', name=NULL)
Please correct me where I am doing wrong, and suggest me if there is any other way of converting this type of data to 5 minutes.
I am still facing problem related to data.date structure in my data is day-month-year, you have suggested to swap the day and months, i did the same and get the desired outcome but in head it is fine but when i am looking for tail, i am finding some problem in it.initally tail was
Date Time Price Volume No.of.trades
91561 31-05-2013 15:25 6004.504 86550 622
91562 31-05-2013 15:26 6003.709 117750 651
91563 31-05-2013 15:27 6000.656 160950 856
91564 31-05-2013 15:28 5997.516 215950 1191
91565 31-05-2013 15:29 5995.305 303200 1784
now with the following code
dt_tm <- as.POSIXct(paste(x[,1], x[,2]),
format="%m-%d-%Y %H:%M", tz="UTC")
ct <- cut(dt_tm, breaks="5 mins")
ct_tm <- as.POSIXct(as.character(ct))
cable <- xts(x[,3:5], order.by=ct_tm)
head(cable)
Price Volume No.of.trades
2012-01-06 09:15:00 4901.895 283550 1286
2012-01-06 09:15:00 4907.046 140000 831
2012-01-06 09:15:00 4904.140 96900 639
2012-01-06 09:15:00 4900.609 84350 553
2012-01-06 09:15:00 4900.067 76450 516
2012-01-06 09:20:00 4898.378 84900 551
but when i am looking for tail
tail(cable)
Price Volume No.of.trades
<NA> 6004.504 86550 622
<NA> 6003.709 117750 651
<NA> 6000.656 160950 856
<NA> 5997.516 215950 1191
<NA> 5995.305 303200 1784
<NA> 5991.419 550 8
kindly help me where now i am going wrong.
I think you may be formating your data incorrectly - swapping day and month.
dt_tm <- as.POSIXct(paste(x[,1], x[,2]),
format="%m-%d-%Y %H:%M", tz="UTC")
Converting to 5 minutes data frame can be achieved like this:
# cut dt_tm to 5 minutes intervals
ct <- cut(dt_tm, breaks="5 mins")
# convert to POSIXct
ct_tm <- as.POSIXct(as.character(ct))
# aggregate
cable <- xts(x[,3:5], order.by=ct_tm)
Time Price Volume
2012-01-06 09:15:00 "09:15" "4901.895" "283550"
2012-01-06 09:15:00 "09:16" "4907.046" "140000"
2012-01-06 09:15:00 "09:17" "4904.140" " 96900"
2012-01-06 09:15:00 "09:18" "4900.609" " 84350"
2012-01-06 09:15:00 "09:19" "4900.067" " 76450"
2012-01-06 09:20:00 "09:20" "4898.378" " 84900"

How can I apply "sapply" in R with multiple codes in one function?

I am a new R user. I have a simple sapply function example for calculating mean and sd for a splitted data frame. My data contains half hourly wind speed with direction. I want to know daily Weibull distribution for my study for 13 years. That is why my dataset is splitted based on time.
My data looks like this:
Time windspeed direction Date day_index
1 24/07/2000 13:00 31 310 2000-07-24 13:00:00 2000_206
2 24/07/2000 13:30 41 320 2000-07-24 13:30:00 2000_206
3 24/07/2000 14:30 37 290 2000-07-24 14:30:00 2000_206
4 24/07/2000 15:00 30 300 2000-07-24 15:00:00 2000_206
5 24/07/2000 15:30 24 320 2000-07-24 15:30:00 2000_206
6 24/07/2000 16:00 22 330 2000-07-24 16:00:00 2000_206
7 24/07/2000 16:30 37 270 2000-07-24 16:30:00 2000_206
The example R code I have for the split-apply to look over the days is:
my.summary <- sapply(split(ballarat_alldata[1:200, ],
ballarat_alldata$day_index[1:200]),
function(x) {
return(c(my.mean=mean(x$windspeed),
my.sd=sd(x$windspeed)))
})
The Weibull distribution code to calculate shape and scale parameters is:
set1 <- createSet(height=10,
v.avg=ballarat_alldata[,2],
dir.avg=ballarat_alldata[,3])
time_ballarat <- strptime(ballarat_alldata[,1], "%d/%m/%Y %H:%M")
ballarat <- createMast(time.stamp=time_ballarat, set1)
ballarat <- clean(mast=ballarat)
ballarat.wb <- weibull(mast=ballarat, v.set=1, print=FALSE)
How can I combine these two set of R codes to calculate Weibull parameters each day and store in a matrix?
I tried many ways but it doesn't work out well.
If these two sets of R codes are combined, should I change wind speed and direction range in set1 <- createSet(height=10, v.avg=ballarat_alldata[,2], dir.avg=ballarat_alldata[,3]) too?
It seems as though you have 2 separate problems here: 1) aggregating your data 2) calculating Weibull parameters. For the first question I can recommend something like:
library(plyr)
Wind <- ddply(Wind, .(as.Date(Date)), transform,
Wind.mean = mean(windspeed), Wind.sd = sd(windspeed))
# windspeed direction Date2 Time2 day_index Wind.mean Wind.sd
# 1 31 310 2000-07-24 13:00:00 2000_206 36.33333 5.033223
# 2 41 320 2000-07-24 13:30:00 2000_206 36.33333 5.033223
# 3 37 290 2000-07-24 14:30:00 2000_206 36.33333 5.033223
# 4 30 300 2000-07-25 15:00:00 2000_206 28.25000 6.751543
# 5 24 320 2000-07-25 15:30:00 2000_206 28.25000 6.751543
# 6 22 330 2000-07-25 16:00:00 2000_206 28.25000 6.751543
# 7 37 270 2000-07-25 16:30:00 2000_206 28.25000 6.751543
If you give me a little bit more of a hint on how you are calculating the parameters you can also use the summarise from the plyr library, something like
ddply(Wind, .(Date2), summarise, rweibull(# I'm not sure what goes here
Hope this helps.

Troubles in applying the zoo aggregate function to a time series

We have the following function to compute monthly returns from a daily series of prices:
PricesRet = diff(Prices)/lag(Prices,k=-1)
tail(PricesRet)
# Monthly simple returns
MonRet = aggregate(PricesRet+1, as.yearmon, prod)-1
tail(MonRet)
The problem is that it returns wrong values, take for example the simple return for the month of Feb 2013, the function returns a return -0.003517301 while it should have been -0.01304773.
Why that happens?
Here are the last prices observations:
> tail(Prices,30)
Prices
2013-01-22 165.5086
2013-01-23 165.2842
2013-01-24 168.4845
2013-01-25 170.6041
2013-01-28 169.7373
2013-01-29 169.8724
2013-01-30 170.6554
2013-01-31 170.7210
2013-02-01 173.8043
2013-02-04 172.2145
2013-02-05 172.8400
2013-02-06 172.8333
2013-02-07 171.3586
2013-02-08 170.5602
2013-02-11 171.2172
2013-02-12 171.4126
2013-02-13 171.8687
2013-02-14 170.7955
2013-02-15 171.2848
2013-02-19 170.9482
2013-02-20 171.6355
2013-02-21 170.0300
2013-02-22 169.9319
2013-02-25 170.9035
2013-02-26 168.6822
2013-02-27 168.5180
2013-02-28 168.4935
2013-03-01 169.6546
2013-03-04 169.3076
2013-03-05 169.0579
Here are price returns:
> tail(PricesRet,50)
PricesRet
2012-12-18 0.0055865274
2012-12-19 -0.0015461900
2012-12-20 -0.0076140194
2012-12-23 0.0032656346
2012-12-26 0.0147750923
2012-12-27 0.0013482760
2012-12-30 -0.0004768131
2013-01-01 0.0128908541
2013-01-02 -0.0047646818
2013-01-03 0.0103372029
2013-01-06 -0.0024547278
2013-01-07 -0.0076920352
2013-01-08 0.0064368720
2013-01-09 0.0119663301
2013-01-10 0.0153828814
2013-01-13 0.0050590540
2013-01-14 -0.0053324785
2013-01-15 -0.0027043105
2013-01-16 0.0118840383
2013-01-17 -0.0005876459
2013-01-21 -0.0145541598
2013-01-22 -0.0013555548
2013-01-23 0.0193624621
2013-01-24 0.0125802978
2013-01-27 -0.0050807744
2013-01-28 0.0007959058
2013-01-29 0.0046096266
2013-01-30 0.0003844082
2013-01-31 0.0180603867
2013-02-03 -0.0091473127
2013-02-04 0.0036322298
2013-02-05 -0.0000390941
2013-02-06 -0.0085320734
2013-02-07 -0.0046591956
2013-02-10 0.0038517581
2013-02-11 0.0011412046
2013-02-12 0.0026607502
2013-02-13 -0.0062440496
2013-02-14 0.0028645616
2013-02-18 -0.0019651341
2013-02-19 0.0040206637
2013-02-20 -0.0093543648
2013-02-21 -0.0005764665
2013-02-24 0.0057176118
2013-02-25 -0.0129979321
2013-02-26 -0.0009730782
2013-02-27 -0.0001453191
2013-02-28 0.0068911863
2013-03-03 -0.0020455332
2013-03-04 -0.0014747845
The results of the function is instead:
> tail(data.frame(MonRet))
MonRet
ott 2012 -0.000848156
nov 2012 0.009833881
dic 2012 0.033406884
gen 2013 0.087822700
feb 2013 -0.023875638
mar 2013 -0.003517301
Your returns are wrong. The return for 2013-01-23 should be:
> 165.2842/165.5086-1
[1] -0.001355821
but you have 0.0193624621. I suspect this is because Prices is an xts object, not a zoo object. lag.xts breaks the convention in lag.ts and lag.zoo of k=1 implying a "lag" of (t+1) for the more common convention of using k=1 to imply a "lag" of (t-1).

extracting and index of dates and times, with varying opening and closing times for minutely ohlc data in R

I would like to be able to get an index of dates and times that represent the opening and closing times of a financial stock-market index for each day.
However the opening and closing times vary due to changes in the rules from an exchange or due to daylight savings, therefore I would to be able to use this index to accurately get Open to Close returns.
I am currently look at the Hang Seng futures index which also has a lunch-break in the middle so I would like this to noted as well in the index. I.E. I would have two opening to closing returns per day due to this lunch-break gap in the data. The time that the lunch break is not always consistent so using the xts function of xts["THH:MM/THH:MM"], would not work. In subsetting the timeseries to be able to get Open to Close data for a specific day
For example the lunch-break times changed in 2011 in March, so when comparing the 14th Feb 2011 lunch break vs the 14th March 2011 lunch break you have the following data...
> HI.raw.sing['20110214']["T12:25/T14:35"]
HI.Open HI.High HI.Low HI.Close HI.Volume
2011-02-14 12:25:00 23020 23028 23018 23018 180
2011-02-14 12:26:00 23018 23023 23014 23019 108
2011-02-14 12:27:00 23020 23033 23016 23033 142
2011-02-14 12:28:00 23031 23038 23025 23026 173
2011-02-14 12:29:00 23026 23046 23026 23042 264
2011-02-14 12:30:00 23044 23059 23041 23042 314
2011-02-14 14:30:00 23044 23044 23044 23044 311
2011-02-14 14:31:00 23118 23129 23099 23117 781
2011-02-14 14:32:00 23117 23143 23113 23143 554
2011-02-14 14:33:00 23143 23156 23139 23139 762
2011-02-14 14:34:00 23139 23161 23138 23138 644
2011-02-14 14:35:00 23139 23149 23137 23144 326
Warning message:
timezone of object (Asia/Singapore) is different than current timezone ().
> HI.raw.sing['20110314']["T11:55/T13:35"]
HI.Open HI.High HI.Low HI.Close HI.Volume
2011-03-14 11:55:00 23060 23075 23059 23071 195
2011-03-14 11:56:00 23071 23071 23059 23064 187
2011-03-14 11:57:00 23064 23074 23063 23068 96
2011-03-14 11:58:00 23069 23075 23068 23075 116
2011-03-14 11:59:00 23075 23078 23069 23073 120
2011-03-14 12:00:00 23073 23098 23073 23089 231
2011-03-14 13:30:00 23090 23090 23090 23090 103
2011-03-14 13:31:00 23082 23112 23074 23108 326
2011-03-14 13:32:00 23108 23124 23100 23123 179
2011-03-14 13:33:00 23124 23133 23111 23111 326
2011-03-14 13:34:00 23110 23119 23103 23115 148
2011-03-14 13:35:00 23115 23139 23114 23129 284
Warning message:
timezone of object (Asia/Singapore) is different than current timezone ().
Notice how the lunch break started at 12:30 on the 14th Feb 2011 but started at 12:00 on the 14th March.
Basically what I am looking for is an ability to detect these breaks in the timestamps. However using missing consecutive timestamp does not always work as there are sometimes missing minutes where nothing traded during the middle of the trading day, and so it is missed when the data was recorded. What I'm looking for is, gaps in the timeseries xts data greater than 5 minutes, output as a list which can be manipulated or be used as an index, which could help me subset the data easily.
You can use diff(index(x)) to identify holes exceeding 5 minutes.
# Sample data
k <- 100
library(xts)
x <- xts( rnorm(100), sort(Sys.time() + runif(100, 0, 5*3600)) )
# Start of the breaks exceeding 5 minutes
i <- diff(index(x)) > 300
close <- x[c(which(i),length(x))]
open <- x[c(1,which(i)+1)]
break_start <- index(close)
break_end <- index(open)

Resources