I frequently use to.daily to convert 1 min OHLC data to a daily format but am trying to find a way to do the same with overnight data. I was hoping to see the option to specify what time a "day" starts and ends but didn't see that.
Overnight session being 18:00 to 09:30.
Does anyone have a simple way to do this?
You could use time-of-day subsetting with which.i = TRUE to find all of the observations you don't want. Then subset the original data with the negative of the result, so all the non-overnight observations will be dropped.
# assume data are in a xts object named 'x'
DayObs <- x["T09:30/T18:30", which.i = TRUE]
Overnight <- x[-DayObs,]
You might need to change the start and end times in the time-of-day subset call.
If you already have your data subset so that it only includes the overnight session, you can aggregate to "daily" using period.apply() and custom endpoints. Assuming your data are in an object named x:
ep <- c(0, which(diff(.indexhour(x) > 9 & .indexmin(x) > 30) == 1))
makeOHLC <- function(x) {
op <- as.numeric(first(x))
cl <- as.numeric(last(x))
c(Open = op, High = max(x), Low = min(x), Close = cl)
}
period.apply(x, ep, makeOHLC)
Related
I have a data.frame in R in which includes two variables with a Start-Date and an End-Date. I would like to add a new column with the number of days between the two dates and reduce the result by the number of sundays in each interval. I tried it like below but it doesn't work:
Data$Start <- as.Date(Data$Start, "%d.%m.%y")
Data$End <- as.Date(Data$End,"%d.%m.%y")
interval <- difftime(Data$Start, Data$End, units = "days")
sundays <- seq(from = Data$Start, to = Data$End, by = "days")
number.sundays <- length(which(wday(sundays)==1))
Data$DaysAhead <- interval - number.sundays
I get the error message in the seq() function, that it has to have the lenght 1 but I don't understand how I can handle this. Can somenone help me out with that?
Here's an example that works:
Data <- data.frame(
Start = c("01.01.2020", "01.06.2020"),
End = c("01.03.2020", "01.09.2020")
)
Data$Start <- as.Date(Data$Start, "%d.%m.%Y")
Data$End <- as.Date(Data$End,"%d.%m.%Y")
interval <- difftime(Data$End, Data$Start, units = "days")
sundays <- lapply(1:nrow(Data), function(i)seq(from = Data$Start[i], to = Data$End[i], by = "days"))
number.sundays <- sapply(sundays, function(x)length(which(lubridate::wday(x)==1)))
Data$DaysAhead <- interval - number.sundays
The problem is that seq() isn't vectorized, it assumes a single start and single end point. If you putt it inside of a loop (like lapply()) it will work and generate the relevant sequence for each start and end time. Then you can use sapply() to figure out how many sundays and since the returned value is a scalar, the return from sapply() will be a vector of the same length as interval.
I realized with an updated data set that there's a problem with the solution above, when Start-Date and End-Date aren't in the same year. I still want to count the days except Sundays starting on the 20.12.2020 until 10.01.2021 for example. The error message showing up in that case is that the sign with the argument "by" is wrong. I just can't manage to get it running . If I turn the dates around, the output makes no sense and the number of days is too high. What do I have to do to get this running over the year-end?
Assume the following dataset. I get closing prices for all working days. But I also have missing rows for dates for which there is no observation. How can I add rows equal to each day and date all the way to the present? The reason I need this done is that I need to average by week and having variable time windows renders that impossible.
Here is my code:
library(quantmod)
from="2012-09-01"
sym = c("BARC")
prices = Map(function(n)
{
print(n)
tryCatch(getSymbols(n, src="google", env=NULL, from=from)[, 4], error =
function(e) NA)
}, sym)
N = length(prices)
# identify symbols returning valid data
i = ! unlist(Map(function(i) is.na(prices[i]), seq(N)))
# combine returned prices list into a matrix, one column for each symbol
prices = Reduce(cbind, prices[i])
colnames(prices) = sym[i]
If you see the "prices" data frame you will see the point I am making.
You can create a blank xts with all the dates first, and then merge with your prices object.
full_dates <- xts(,order.by = seq(from = start(prices), to = end(prices), by= "day"))
full_prices <- merge(full_dates,prices, all = TRUE)
You can also choose to fill forward the missing prices, by the following
na.locf(full_prices)
I want to split my large xts object in a list of regular one second periods containing all the observations of the original objects. The goal is to send each list element to nodes on my cluster for processing.
I came up with this solution, which is quite elaborate. I'm wondering if this code can be simplified:
library(xts)
set.seed(123)
myts = xts(1:10000, as.POSIXlt(1366039619, ts="EST", origin="1970-01-01") + rnorm(10000, 1, 100))
# insure we have at least one observation per second
secs = seq(trunc(index(head(myts, 1))), trunc(index(tail(myts, 1))), by="s")
# generate second periods endpoints
myts = merge(myts, secs, fill=na.locf)
myts.aligned = align.time(myts, 1)
myts.ep = endpoints(myts.aligned, "seconds", 1)
# split large xts object in list of second periods
myts.list = lapply(1:(length(myts.ep)-1), function(x, myts, ep) { myts[ep[x]:ep[x+1],] }, myts, myts.ep)
# call to parLapply here...
I think this does what you want:
split(myts, "secs")
It will create a list where each component is 1 second of non-overlapping data.
See ?split.xts
I have a data frame (71568x4) consist of several variables which are observed every hour (total are 24hours in a day) and contain many NAs.
I want to find the maximum value in every 24hours (in other word is a daily maxima) for each variable. If 12 or more hourly observation are missing during this 24hours period on any day, the data for that day is considered missing and hence reported as NA. Can anyone help me to do this in R?
Here is a sample example:
tDate <- rep(c(19980101,19980102,19980103), each = 24)
tTime <- rep(seq(1:24), 3)
x1 <- c(c(1:4),rep(NA,7),c(2:10),6,2,9,1,rep(NA,4),c(4:23),c(2:8),
rep(NA,7),c(3:5),rep(NA,7))
x2 <- c(rep(NA,3),c(11:15),NA,c(3:15),rep(NA,10),c(7:10),NA,c(2:4),NA,3,
rep(NA,5),c(6:9),NA,c(8:20),rep(NA,5),5,1)
datmat <- cbind(tDate,tTime,x1,x2)
The output will be like this
> matrix(c(10,23,NA,15,NA,20), byrow = FALSE, ncol = 2)
Many thanks in advance.
I'd define a custom function to take the max that you want:
my.max <- function(vec) {
if(length(vec[is.na(vec)])>=12) {
return(NA)
} else {
return(max(vec, na.rm=T))
}
}
Then use plyr and specifically ddply:
ddply(as.data.frame(datmat), .(tDate), summarise, x1=my.max(x1), x2=my.max(x2))
I have an xts numeric matrix that includes multiple days of minute interval series. I need to calculate statistics for each day on the minute periods, add new columns, and then put all the day series back together.
I have tried apply.daily(), which calls my stats function with an xts matrix, for each day, but I can't figure how to return the modified day series back to the invoking function and reassemble the full set of modified data.
One solution that could work is to use endpoints(x, on = "day") in a loop, then call rbind to reassemble the processed day frames. Is there a better solution?
process = function(myxts) {
day.indexes = endpoints(myxts, on="days")
days = length(day.indexes) - 1
l = list()
list.index = 1
for( i in 1:days ) {
day.begin = day.indexes[i] + 1
day.end = day.indexes[i+1]
l[[list.index]] = ets.sym.process.daily(myxts[day.begin:day.end])
list.index = list.index + 1
}
return(do.call("rbind", l))
}
You should be able to use some combination of do.call(rbind, lapply(split(myxts,"days"), myfun)). It's hard to be more specific without a reproducible example.