xts::apply.weekly thinks Monday is the last day of the week - r

I have an R data.frame containing one value for every quarter of hour
Date A B
1 2015-11-02 00:00:00 0 0 //day start
2 2015-11-02 00:15:00 0 0
3 2015-11-02 00:30:00 0 0
4 2015-11-02 00:45:00 0 0
...
96 2015-11-02 23:45:00 0 0 //day end
97 2015-11-03 00:00:00 0 0 //new day
...
6 2016-03-23 01:15:00 0 0 //last record
I use xts to construct a time series
xtsA <- xts(data$A,data$Date)
by using apply.daily I get the result I expect
apply.daily(xtsA, sum)
Date A
1 2015-11-02 23:45:00 400
2 2015-11-03 23:45:00 400
3 2015-11-04 23:45:00 500
but apply.weekly seems to use Monday as last day of the week
Date A
19 2016-03-07 00:45:00 6500 //Monday
20 2016-03-14 00:45:00 5500 //Monday
21 2016-03-21 00:45:00 5000 //Monday
and I do not understand why it uses 00:45:00. Does anyone know?

Data is imported from CSV file the Date column looks like this:
data <- read.csv("...", header=TRUE)
Date A
1 151102 0000 0
...
The error is in the date time interpretation and using
data$Date <- as.POSIXct(strptime(data$Date, "%y%m%d %H%M"), tz = "GMT")
solves it, and now apply.weekly returns
Date A
1 2015-11-08 23:45:00 3500 //Sunday
2 2015-11-15 23:45:00 4000 //Sunday
...

Related

How to use own data with quantstrat and quantmod?

I am learning quantstrat and is working on a project where I use a local csv file which I exported from metatrader5. I managed to load the data into an xts object and called it fulldata_xts of which I have created subsets bt_xts and wf_xts for the backtest and walk forward respectively. Below is the head of fulldata_xts. I have added the other columns other than the standard OHLCV.
EURUSD.Open EURUSD.High EURUSD.Low
2010-01-03 16:00:00 1.43259 1.43336 1.43151
2010-01-03 17:00:00 1.43151 1.43153 1.42879
2010-01-03 18:00:00 1.42885 1.42885 1.42569
2010-01-03 19:00:00 1.42702 1.42989 1.42700
2010-01-03 20:00:00 1.42938 1.42968 1.42718
2010-01-03 21:00:00 1.42847 1.42985 1.42822
EURUSD.Close EURUSD.Volume EURUSD.Vol
2010-01-03 16:00:00 1.43153 969 0
2010-01-03 17:00:00 1.42886 2098 0
2010-01-03 18:00:00 1.42705 2082 0
2010-01-03 19:00:00 1.42939 1544 0
2010-01-03 20:00:00 1.42848 1131 0
2010-01-03 21:00:00 1.42897 1040 0
EURUSD.Spread EURUSD.Year EURUSD.Month
2010-01-03 16:00:00 12 2010 1
2010-01-03 17:00:00 15 2010 1
2010-01-03 18:00:00 15 2010 1
2010-01-03 19:00:00 14 2010 1
2010-01-03 20:00:00 15 2010 1
2010-01-03 21:00:00 14 2010 1
EURUSD.Day EURUSD.Weekday EURUSD.Hour
2010-01-03 16:00:00 4 2 0
2010-01-03 17:00:00 4 2 1
2010-01-03 18:00:00 4 2 2
2010-01-03 19:00:00 4 2 3
2010-01-03 20:00:00 4 2 4
2010-01-03 21:00:00 4 2 5
EURUSD.Session EURUSD.EMA14
2010-01-03 16:00:00 0 NA
2010-01-03 17:00:00 0 NA
2010-01-03 18:00:00 0 NA
2010-01-03 19:00:00 0 NA
2010-01-03 20:00:00 0 NA
2010-01-03 21:00:00 0 NA
EURUSD.EMA14_Out
2010-01-03 16:00:00 0
2010-01-03 17:00:00 0
2010-01-03 18:00:00 0
2010-01-03 19:00:00 0
2010-01-03 20:00:00 0
2010-01-03 21:00:00 0
I am trying to create my own indicator using the following code:
add.indicator(strategy1.st, name = sentiment,
arguments = list(date = quote(Cl(mktdata))),
label = "sentiment")
I have based the above code from a course on datacamp but is similar to what is being discussed here. My questions are:
How can I specify my own data i.e. bt_xts on the code above. Please correct me if I am wrong but from what I gather, the mktdata object gets created when the data is downloaded using quantstrat facilities which is not applicable on my case since I read the data off of csv and converted it to data table then to an xts object.
The function sentiment on the inside the add.indicator code above for now only functions returns 0,1,2 (stay out, bullish, bearish) based on day of week. I plan to develop this further once I get the other part of the strategy working. This function takes in a variable date hence the arguments = list(date = quote(Cl(mktdata))) part is incorrect. What should I put inside the quote() to specify the date column of my data, bt_xts?

R: calculate number of occurrences which have started but not ended - count if within a datetime range

I've got a dataset with the following shape
ID Start Time End Time
1 01/01/2017 00:15:00 01/01/2017 07:15:00
2 01/01/2017 04:45:00 01/01/2017 06:15:00
3 01/01/2017 10:20:00 01/01/2017 20:15:00
4 01/01/2017 02:15:00 01/01/2017 00:15:00
5 02/01/2017 15:15:00 03/01/2017 00:30:00
6 03/01/2017 07:00:00 04/01/2017 09:15:00
I would like to count every 15 min for an entire year how many items have started but not finished, so count the number of times with a start time greater or equal than the time I'm looking at and an end time less or equal than the time I'm looking at.
I'm looking for an approach using tidyverse/dplyr if possible.
Any help or guidance would be very much appreciated.
If I understand correctly, the OP wants to count the number of simultaneously active events.
One possibility to tackle this question is the coverage() function from Bioconductor's IRange package. Another one is to aggregate in a non-equi join which is available with the data.table package.
Non-equi join
# create sequence of datetimes (limited to 4 days for demonstration)
seq15 <- seq(lubridate::as_datetime("2017-01-01"),
lubridate::as_datetime("2017-01-05"), by = "15 mins")
# aggregate within a non-equi join
library(data.table)
result <- periods[.(time = seq15), on = .(Start.Time <= time, End.Time > time),
.(time, count = sum(!is.na(ID))), by = .EACHI][, .(time, count)]
result
time count
1: 2017-01-01 00:00:00 0
2: 2017-01-01 00:15:00 1
3: 2017-01-01 00:30:00 1
4: 2017-01-01 00:45:00 1
5: 2017-01-01 01:00:00 1
---
381: 2017-01-04 23:00:00 0
382: 2017-01-04 23:15:00 0
383: 2017-01-04 23:30:00 0
384: 2017-01-04 23:45:00 0
385: 2017-01-05 00:00:00 0
The result can be visualized graphically:
library(ggplot2)
ggplot(result) + aes(time, count) + geom_step()
Data
periods <- readr::read_table(
"ID Start.Time End.Time
1 01/01/2017 00:15:00 01/01/2017 07:15:00
2 01/01/2017 04:45:00 01/01/2017 06:15:00
3 01/01/2017 10:20:00 01/01/2017 20:15:00
4 01/01/2017 02:15:00 01/01/2017 00:15:00
5 02/01/2017 15:15:00 03/01/2017 00:30:00
6 03/01/2017 07:00:00 04/01/2017 09:15:00"
)
# convert date strings to class Date
library(data.table)
cols <- names(periods)[names(periods) %like% "Time$"]
setDT(periods)[, (cols) := lapply(.SD, lubridate::dmy_hms), .SDcols = cols]
periods
ID Start.Time End.Time
1: 1 2017-01-01 00:15:00 2017-01-01 07:15:00
2: 2 2017-01-01 04:45:00 2017-01-01 06:15:00
3: 3 2017-01-01 10:20:00 2017-01-01 20:15:00
4: 4 2017-01-01 02:15:00 2017-01-01 00:15:00
5: 5 2017-01-02 15:15:00 2017-01-03 00:30:00
6: 6 2017-01-03 07:00:00 2017-01-04 09:15:00

My data.frame is not readable with dygraph

I would like to produce dygraph with a data.frame I import from CSV file. I suspect my date column is formatted incorrectly. My date column is originally in %m/%d/%y format.
If applicable, column 1 is class(factor), column 2 and 3 are class(integer). Here is head(mydata)
Date term1 term2
1 7/1/16 2304 0
2 7/2/16 2304 0
3 7/3/16 1628 0
4 7/4/16 1230 0
5 7/5/16 1216 5
6 7/6/16 2056 0
Here is the dygraph command:
library(tidyverse)
library(dygraphs)
dygraph(mydata, main = "mydata") %>%
dyRangeSelector()
I received error: Unsupported type passed to argument 'data'.
I then converted mydata$Date to POSIXct like this:
mydata$DateTime=as.POSIXct(paste(mydata$Date, mydata$Time), format="%Y%m%d %H%M%S")
I expected the above to correct the problem, however I still receive same error. When I view(mydata), I see this:
Date term1 term2 DateTime
1 <NA> 2304 0 <NA>
2 <NA> 2304 0 <NA>
3 <NA> 1628 0 <NA>
4 <NA> 1230 0 <NA>
5 <NA> 1216 5 <NA>
6 <NA> 2056 0 <NA>
Clearly, this only worsened the problem.
I was able to use dygraph on imported stock data, and based on head(my stock data) the correct head(mydata) would look like this:
Date Open High Low Close Volume
2016-02-03 2016-02-02 18:00:00 18.00 18.88 16.000 18.20 4157398
2016-02-04 2016-02-03 18:00:00 18.26 19.42 17.570 18.50 469900
2016-02-05 2016-02-04 18:00:00 18.84 18.88 17.520 17.60 219900
2016-02-08 2016-02-07 18:00:00 17.52 18.00 15.720 15.85 372100
2016-02-09 2016-02-08 18:00:00 15.50 15.50 12.748 12.81 744100
2016-02-10 2016-02-09 18:00:00 13.01 14.00 12.790 13.09 260800
Thank you in advance for everyone's time & insight.
-M
library(zoo)
library(highcharter)
library(xts)
Date=mydata6$Date=as.Date(as.character(mydata6$Date,"%Y-%m-%d"))
Open=mydata6$Open=as.numeric(na.locf(mydata6$Open))
High=mydata6$High=as.numeric(na.locf(mydata6$High))
Z=cbind(Open, High)
newdata=xts(Z,mydata$Date)
dygraph(newdata, main = "Stock") %>%
dyRangeSelector()

insert a new row in R based on time interval checking

ALL;
I just have a data file with two columns, one is time series, one is values. Normally, the time interval between tow rows is exact 5 mins,but sometimes it is larger than 5 mins
A sample is as below:
dd <- data.table(date = c("2015-07-01 00:00:00", "2015-07-01 00:05:00", "2015-07-01 00:20:00","2015-07-01 00:25:00","2015-07-01 00:30:00"),
value = c(9,1,10,12,0))
what i want to do is to check the time interval between two rows, when the time interval is larger than 5 mins, then insert a new row below with 0 value, so , the result could be :
date value
2015-07-01 00:00:00 9
2015-07-01 00:05:00 1
2015-07-01 00:10:00 0
2015-07-01 00:15:00 0
2015-07-01 00:20:00 10
2015-07-01 00:25:00 12
2015-07-01 00:30:00 0
any suggestion and idea is welcome :)
We can do a join after converting to 'date' to DateClass
dd[, date := as.POSIXct(date)][]
dd[dd[, .(date=seq(min(date), max(date), by = "5 min"))], on = 'date'
][is.na(value), value := 0][]
# date value
#1: 2015-07-01 00:00:00 9
#2: 2015-07-01 00:05:00 1
#3: 2015-07-01 00:10:00 0
#4: 2015-07-01 00:15:00 0
#5: 2015-07-01 00:20:00 10
#6: 2015-07-01 00:25:00 12
#7: 2015-07-01 00:30:00 0

How to convert Date or Datetime field when some parts are blank; na.omit fails

I have a data set that has dates and times for in and out. Each line is an in and out set, but some are blank. I can remove the blanks with na.omit and a nice read in (it was a csv, and na.strings=c("") works on the read.csv).
Of course, because the real world is never like the tutorial, some of the times are only dates, so my as.POSIXlt(Dataset$In,format="%m/%d/%Y %H:%M") returns NA on the "only date no time"s.
na.omit does not remove these lines. so the questions are 2
Why doesn't na.omit work, or how can I get it to work?
Better, How can I convert one column into both Dates and Times (in the posix format) without 2 calls or with some sort of optional parameter in the format string? (or is this even possible?).
This is a sample of the dates and times. I can't share the real file, 1 it's huge, 2 it's PII.
Id,In,Out
1,8/15/2015 8:00,8/15/2015 17:00
1,8/16/2015 8:04,8/16/2015
1,8/17/2015 8:50,8/17/2015 18:00
1,8/18/2015,8/18/2015 17:00
2,8/15/2015,8/15/2015 13:00
2,8/16/2015 8:00,8/16/2015 17:00
3,8/15/2015 4:00,8/15/2015 11:00
3,8/16/2015 9:00,8/16/2015 19:00
3,8/17/2015,8/17/2015 17:00
3,,
4,,
4,8/16/2015 6:00,8/16/2015 20:00
DF <- read.table(text = "Id,In,Out
1,8/15/2015 8:00,8/15/2015 17:00
1,8/16/2015 8:04,8/16/2015
1,8/17/2015 8:50,8/17/2015 18:00
1,8/18/2015,8/18/2015 17:00
2,8/15/2015,8/15/2015 13:00
2,8/16/2015 8:00,8/16/2015 17:00
3,8/15/2015 4:00,8/15/2015 11:00
3,8/16/2015 9:00,8/16/2015 19:00
3,8/17/2015,8/17/2015 17:00", header = TRUE, sep = ",",
stringsAsFactors = FALSE) #set this option during import
DF$In[nchar(DF$In) < 13] <- paste(DF$In[nchar(DF$In) < 13], "0:00")
DF$Out[nchar(DF$Out) < 13] <- paste(DF$Out[nchar(DF$Out) < 13], "0:00")
DF$In <- as.POSIXct(DF$In, format = "%m/%d/%Y %H:%M", tz = "GMT")
DF$Out <- as.POSIXct(DF$Out, format = "%m/%d/%Y %H:%M", tz = "GMT")
# Id In Out
#1 1 2015-08-15 08:00:00 2015-08-15 17:00:00
#2 1 2015-08-16 08:04:00 2015-08-16 00:00:00
#3 1 2015-08-17 08:50:00 2015-08-17 18:00:00
#4 1 2015-08-18 00:00:00 2015-08-18 17:00:00
#5 2 2015-08-15 00:00:00 2015-08-15 13:00:00
#6 2 2015-08-16 08:00:00 2015-08-16 17:00:00
#7 3 2015-08-15 04:00:00 2015-08-15 11:00:00
#8 3 2015-08-16 09:00:00 2015-08-16 19:00:00
#9 3 2015-08-17 00:00:00 2015-08-17 17:00:00
na.omit doesn't work with POSIXlt objects because it is documented to "handle vectors, matrices and data frames comprising vectors and matrices (only)." (see help("na.omit")). And in the strict sense, POSIXlt objects are not vectors:
unclass(as.POSIXlt(DF$In))
#$sec
#[1] 0 0 0 0 0 0 0 0 0
#
#$min
#[1] 0 4 50 0 0 0 0 0 0
#
#$hour
#[1] 8 8 8 0 0 8 4 9 0
#
#$mday
#[1] 15 16 17 18 15 16 15 16 17
#
#$mon
#[1] 7 7 7 7 7 7 7 7 7
#
#$year
#[1] 115 115 115 115 115 115 115 115 115
#
#$wday
#[1] 6 0 1 2 6 0 6 0 1
#
#$yday
#[1] 226 227 228 229 226 227 226 227 228
#
#$isdst
#[1] 0 0 0 0 0 0 0 0 0
#
#attr(,"tzone")
#[1] "GMT"
There is hardly any reason to prefer POSIXlt over POSIXct (which is an integer giving the number of seconds since the origin internally and thus needs less memory).
You've been given a couple of strategies that bring these character values in and process "in-place". I almost never use as.POSIXlt since there are so many pitfalls in dealing with the list-in-list structures that it returns, especially considering its effective incompatibility with dataframes. Here's a method that does the testing and coercion at the read.-level by defining an as-method:
setOldClass("inTime", prototype="POSIXct")
setAs("character", "inTime",
function(from) structure( ifelse( is.na(as.POSIXct(from, format="%m/%d/%Y %H:%M") ),
as.POSIXct(from, format="%m/%d/%Y") ,
as.POSIXct(from, format="%m/%d/%Y %H:%M") ),
class="POSIXct" ) )
read.csv(text=txt, colClasses=c("numeric", 'inTime','inTime') )
Id In Out
1 1 2015-08-15 08:00:00 2015-08-15 17:00:00
2 1 2015-08-16 08:04:00 2015-08-16 00:00:00
3 1 2015-08-17 08:50:00 2015-08-17 18:00:00
4 1 2015-08-18 00:00:00 2015-08-18 17:00:00
5 2 2015-08-15 00:00:00 2015-08-15 13:00:00
6 2 2015-08-16 08:00:00 2015-08-16 17:00:00
7 3 2015-08-15 04:00:00 2015-08-15 11:00:00
8 3 2015-08-16 09:00:00 2015-08-16 19:00:00
9 3 2015-08-17 00:00:00 2015-08-17 17:00:00
The structure "envelope" is needed because of the rather strange behavior of ifelse, which otherwise would return a numeric object rather than an object of class-'POSIXct'.

Resources