I have an xts numeric matrix that includes multiple days of minute interval series. I need to calculate statistics for each day on the minute periods, add new columns, and then put all the day series back together.
I have tried apply.daily(), which calls my stats function with an xts matrix, for each day, but I can't figure how to return the modified day series back to the invoking function and reassemble the full set of modified data.
One solution that could work is to use endpoints(x, on = "day") in a loop, then call rbind to reassemble the processed day frames. Is there a better solution?
process = function(myxts) {
day.indexes = endpoints(myxts, on="days")
days = length(day.indexes) - 1
l = list()
list.index = 1
for( i in 1:days ) {
day.begin = day.indexes[i] + 1
day.end = day.indexes[i+1]
l[[list.index]] = ets.sym.process.daily(myxts[day.begin:day.end])
list.index = list.index + 1
}
return(do.call("rbind", l))
}
You should be able to use some combination of do.call(rbind, lapply(split(myxts,"days"), myfun)). It's hard to be more specific without a reproducible example.
Related
I frequently use to.daily to convert 1 min OHLC data to a daily format but am trying to find a way to do the same with overnight data. I was hoping to see the option to specify what time a "day" starts and ends but didn't see that.
Overnight session being 18:00 to 09:30.
Does anyone have a simple way to do this?
You could use time-of-day subsetting with which.i = TRUE to find all of the observations you don't want. Then subset the original data with the negative of the result, so all the non-overnight observations will be dropped.
# assume data are in a xts object named 'x'
DayObs <- x["T09:30/T18:30", which.i = TRUE]
Overnight <- x[-DayObs,]
You might need to change the start and end times in the time-of-day subset call.
If you already have your data subset so that it only includes the overnight session, you can aggregate to "daily" using period.apply() and custom endpoints. Assuming your data are in an object named x:
ep <- c(0, which(diff(.indexhour(x) > 9 & .indexmin(x) > 30) == 1))
makeOHLC <- function(x) {
op <- as.numeric(first(x))
cl <- as.numeric(last(x))
c(Open = op, High = max(x), Low = min(x), Close = cl)
}
period.apply(x, ep, makeOHLC)
Assume the following dataset. I get closing prices for all working days. But I also have missing rows for dates for which there is no observation. How can I add rows equal to each day and date all the way to the present? The reason I need this done is that I need to average by week and having variable time windows renders that impossible.
Here is my code:
library(quantmod)
from="2012-09-01"
sym = c("BARC")
prices = Map(function(n)
{
print(n)
tryCatch(getSymbols(n, src="google", env=NULL, from=from)[, 4], error =
function(e) NA)
}, sym)
N = length(prices)
# identify symbols returning valid data
i = ! unlist(Map(function(i) is.na(prices[i]), seq(N)))
# combine returned prices list into a matrix, one column for each symbol
prices = Reduce(cbind, prices[i])
colnames(prices) = sym[i]
If you see the "prices" data frame you will see the point I am making.
You can create a blank xts with all the dates first, and then merge with your prices object.
full_dates <- xts(,order.by = seq(from = start(prices), to = end(prices), by= "day"))
full_prices <- merge(full_dates,prices, all = TRUE)
You can also choose to fill forward the missing prices, by the following
na.locf(full_prices)
Ok so lets take this code below which calculates a rolling simple moving average over 2 day period:
# Use TTR package to create rolling SMA n day moving average
new.df$close.sma.n2 <- SMA(new.df[,"Close"], 2)
Lets say I want to calculate the n day period of 2:30
The inputs here is:
close.sma.n**
and also the numerical value for the SMA calculation.
So my question is:
How can I write one line of code to perform the above calculation on different SMA periods and also making a new column with corresponding close.sma.n2,3,4,5,6 etc value in a data frame?
I understand I can do:
n.sma <- 2:30
and put that variable in:
new.df$close.sma.n2 <- SMA(new.df[,"Close"], n.sma)
Perhaps I can:
name <- "n2:30"
and place that inside:
new.df$close.sma.name <- SMA(new.df[,"Close"], n.sma)
You didn't provide sample data or SMA, so I made up dummy functions to test my code.
df <- data.frame(Close=c(1, 2, 3, 4))
SMA <- function(x, numdays) {numdays}
Then, I wrote a function that takes in the number of days to average, and returns a function that takes a data.frame and takes the SMA over that many days.
getSMA <- function(numdays) {
function(new.df) {
SMA(new.df[,"Close"], numdays)
}
}
Then, create a matrix to put the SMAs in
smas <- matrix(nrow=nrow(df), ncol=0)
and fill it.
for (i in 2:30) {
smas <- cbind(smas, getSMA(i)(df))
}
Then, set the column names to what you want them to be
colnames(smas) <- sapply(2:30, function(n)paste("close.sma.n", n, sep=""))
and bind it with the starting data frame.
df <- cbind(df, smas)
Now you have your original data.frame, plus "close.sma.n2" through ".n30"
I want to create a dataframe of 15 minute intervals over 24 hours starting with a certain inverval on several dates. I use a loop for that but instant of the actual intervals it stores the number of seconds which is not useful in my case. Is there any way to avoid this? I need the intervals to look how often timed events happen in these intervals. I found one similar question, but the answer concentrated on using lapply instead of apply, which isn't applicable here.
So here is a basic example:
begin<-as.POSIXct(rbind("2016-03-31 09:00:00","2016-04-12 09:00:00"))
end<-as.POSIXct(rbind("2016-03-31 09:15:00","2016-04-12 09:15:00"))
int<-as.interval(begin,end)
aufl<-duration(15, "mins")
Intervall=data.frame()
for (j in 1:length(int)){for (i in 1:96){Intervall[j,i]<-int_shift(int[j],aufl*(i-1))}}
Intervall
I created an answer, I hope this is what you are looking for. If not, please comment:
library(lubridate)
begin <- as.POSIXct(rbind("2016-03-31 09:00:00","2016-04-12 09:00:00"))
# copy begin time for loop
begin_new <- begin
# create durateion object
aufl <- duration(15, "mins")
# count times for loop
times <- 24*60/15
# create dataframe with begin time
Intervall <- data.frame(begin,stringsAsFactors = FALSE)
for (i in 1:times){
cat("test",i,"\n")
# save old time for interval calculation
begin_start <- begin_new
# add 15 Minutes to original time
begin_new <- begin_new + aufl
cat(begin_new,"\n")
# create an interval object between
new_dur <- interval(begin_start,begin_new)
# bind to original dataframe
Intervall <- cbind(Intervall,new_dur)
}
# Add column names
vec_names <- paste0("v",c(1:(times+1)))
colnames(Intervall) <- vec_names
I want to split my large xts object in a list of regular one second periods containing all the observations of the original objects. The goal is to send each list element to nodes on my cluster for processing.
I came up with this solution, which is quite elaborate. I'm wondering if this code can be simplified:
library(xts)
set.seed(123)
myts = xts(1:10000, as.POSIXlt(1366039619, ts="EST", origin="1970-01-01") + rnorm(10000, 1, 100))
# insure we have at least one observation per second
secs = seq(trunc(index(head(myts, 1))), trunc(index(tail(myts, 1))), by="s")
# generate second periods endpoints
myts = merge(myts, secs, fill=na.locf)
myts.aligned = align.time(myts, 1)
myts.ep = endpoints(myts.aligned, "seconds", 1)
# split large xts object in list of second periods
myts.list = lapply(1:(length(myts.ep)-1), function(x, myts, ep) { myts[ep[x]:ep[x+1],] }, myts, myts.ep)
# call to parLapply here...
I think this does what you want:
split(myts, "secs")
It will create a list where each component is 1 second of non-overlapping data.
See ?split.xts