R How to use a complex function at seasonal period under hydroTSM and xts packages? - r

I want to calculate the seasonal mean of my parameter values (when x > 0.002). To do this, I use xts::period.apply() to separate the values seasonally. I use the "quarter" period in endpoints(), but the "quarter" period divides the year under four seasons as following:
"January+February+March",
"April+May+June",
"July+August+Septembre",
"October+November+December"
For example:
library(xts)
library(PerformanceAnalytics)
data(edhec)
head(edhec)
edhec_4yr <- edhec["1997/2001"]
ep <- endpoints(edhec_4yr, "quarter")
# mean
period.apply(edhec_4yr, INDEX = ep,
function(x) apply(x,2, function(y) mean(y[y>0.002])))
But for my study, I want my seasonal period divided as following:
"December+January+February",
"March+April+May",
"June+July+August",
"Septembre+October+November"
Can you help me how to change the order months of "quarter" period?
I can use the simple function (mean, max, min) under the hydroTSM package with the following function:
dm2seasonal(edhec_4yr, FUN=mean, season="DJF")
Where:
DJF : December, January, February
MAM : March, April, May
JJA : June, July, August
SON : September, October, November
But I cannot applied the complex function (mean with condition) as the following function:
dm2seasonal(edhec_4yr, season="DJF",
function(x) apply(x,2, function(y) mean(y[y>0.002])))
Can you help me how to improve this function in order to calculate mean value (when x > 0.02) for DJF for example?

The xts::endpoints() function always returns the last observation in a "standard" period, starting from the origin (midnight, 1970-01-01). So it can't easily do what you want.
You can calculate your own period end points by finding the observation on the last day of the last month in each 3-month window. Here's one way to do that with monthly data:
# .indexmon() returns a zero-based month
ep <- which((.indexmon(edhec_4yr) + 1) %in% c(2, 5, 8, 11))
aggfn <- function(x, bound = 0.002, ...) {
apply(x,2, function(y) mean(y[y > bound], ...))
}
period.apply(edhec_4yr, ep, aggfn)
If you have daily data, you need to find the last day of each month your periods end in. You can do that by using .indexmon() to find all months that end each season, then construct an xts object with the locations of all those observations in the original daily data object. Then you can use apply.monthly() and last() to extract the location of the last day of each season-ending month. The resulting object contains the end points you need to pass to period.apply().
data(prices)
prices <- as.xts(prices) # 'prices' is zoo; convert to xts
season_months <- (.indexmon(prices)+1) %in% c(2, 5, 8, 11)
ep_months <- xts(which(season_months), index(prices)[season_months])
ep_seasons <- as.numeric(apply.monthly(ep_months, last))
period.apply(prices, ep_seasons, aggfn)
And I should note that I'm thinking about how to specify end points in a more flexible manner, and I'll make sure to include a way to specify seasons.

Related

Daily Time Series Analysis

I have a daily time series about the sales of a product, my series start from 01/01/2016 until 31/08/2017, my problem is that I do not know what value of frequency I should use, considering that it is a six-day week (my week starts on Monday and ends Saturday) and there is no data for Sundays.
Should be it like this ?
myts <- ts(sales, start=c(2016, 1), frequency=6)
Thanks for your help !!
ts expects you to have values for each element of the time-series, i.e., it would expect you to have the seventh day values in the data.
One option is to expand the date index to include your missing observations. You could fill those missing observations with na.approx or na, but you can't give ts a six day week and expect it to comprehend it as a seven day cycle.
A good way to do this is to look at zoo, which has specific functions for dealing with these sorts of situations.
It really depends on what you want to do with the data.
1) plot for example, if your objective is simply to plot it then "ts" class is not a good fit since it is not good at representing dates. Try this instead where we have defined test vector for sales and tt in the Note at the end.
library(zoo)
z <- zoo(sales, tt)
plot(z)
2) acf If you want to compute the autocorrelation function then using the plain vector sales or ts(sales) would be fine:
acf(sales)
3) StructTS If you want to fit a structural time series using StructTS then you will need to decide on the length of a cycle, i.e. does it repeat every week? quarter? year?. Typically an annual cycle is appropriate for sales but, in general, you will need two complete cycles to do anything so you don't really have enough data for that.
4) monthly/quarterly If you are willing to reduce it to monthly or quarterly data then you could use ts but you only have 20 points for monthly or 7 for quarterly. Here we have used the last point in each month:
library(zoo)
z <- zoo(sales, tt)
zm <- aggregate(z, as.yearmon, tail, 1)
tsm <- as.ts(zm)
tsm
giving:
Jan Feb Mar Apr May Jun Jul Aug
2016 3.258097 3.931826 4.356709 4.644391 4.867534 5.049856 5.204007 5.342334
2017 5.828946 5.897154 5.968708 6.030685 6.093570 6.150603 6.204558 6.257668
Sep Oct Nov Dec
2016 5.459586 5.564520 5.659482 5.749393
2017
5) weekly Another thing you could consider would be to use weekly series by just using Saturday, for example:
library(zoo)
z <- zoo(sales, tt)
zw <- z[weekdays(time(z)) == "Saturday"]
Note: We used this dummy data:
set.seed(123)
tt <- seq(as.Date("2016-01-01"), as.Date("2017-08-31"), "day")
tt <- tt[! weekdays(tt) == "Sunday"]
n <- length(tt)
sales <- log(1:n)

How do I add periods to time series in R after aggregation

I have a two variable dataframe (df) in R of daily sales for a ten year period from 2004-07-09 through 2014-12-31. Not every single date is represented in the ten year period, but pretty much most days Monday through Friday.
My objective is to aggregate sales by quarter, convert to a time series object, and run a seasonal decomposition and other time series forecasting.
I am having trouble with the conversion, as ulitmately I receive a error:
time series has no or less than 2 periods
Here's the structure of my code.
# create a time series object
library(xts)
x <- xts(df$amount, df$date)
# create a time series object aggregated by quarter
q.x <- apply.quarterly(x, sum)
When I try to run
fit <- stl(q.x, s.window = "periodic")
I get the error message
series is not periodic or has less than two periods
When I try to run
q.x.components <- decompose(q.x)
# or
decompose(x)
I get the error message
time series has no or less than 2 periods
So, how do I take my original dataframe, with a date variable and an amount variable (sales), aggregate that quarterly as a time series object, and then run a time series analysis?
I think I was able to answer my own question. I did this. Can anyone confirm if this structure makes sense?
library(lubridate)
# add a new variable indicating the calendar year.quarter (i.e. 2004.3) of each observation
df$year.quarter <- quarter(df$date, with_year = TRUE)
library(plyr)
# summarize gift amount by year.quarter
new.data <- ddply(df, .(year.quarter), summarize,
sum = round(sum(amount), 2))
# convert the new data to a quarterly time series object beginning
# in July 2004 (2004, Q3) and ending in December 2014 (2014, Q4)
nd.ts <- ts(new.data$sum, start = c(2004,3), end = c(2014,4), frequency = 4)

Remove incomplete month from monthly return calculation

I have some code for grabbing stock prices and calculating monthly returns. I would like to drop the last return if the price used to calculate it did not occur at month end. For example, running the code below returns prices through 2014-06-13. And, the monthlyReturn function calculates a return for June even though there hasn't been a full month. Is there an easy way to make sure monthlyReturn is only computing returns on full months or to drop the last month from the return vector if it wasn't calculated on a full month of prices?
library(quantmod)
symbols <- c('XLY', 'XLP', 'XLE', 'XLF', 'XLV', 'XLI', 'XLB', 'XLK', 'XLU')
Stock <- xts()
Prices <- xts()
for (i in 1:length(symbols)){
Stock <- getSymbols(symbols[i],auto.assign = FALSE)
Prices <- merge(Prices,Stock[,6])
}
returns <- do.call(cbind, lapply(Prices, monthlyReturn, leading=FALSE))
names(returns) <- symbols
I found this bit of code, but it seems to have some limitations. Is there a way to improve this?
if(tail(index(x.xts),1) != as.Date(as.yearmon(tail(index(x.xts),1)), frac=1)){
x.m.xts = x.m.xts[-dim(x.m.xts)[1],]
}
# That test isn't quite right, but its close. It won't work on the first
# day of a new month when the last business day wasn't the last day of
# the month. It will work for the second day.
You can use negative subsetting with xts:::last.xts. This will remove the last month
last(returns, "-1 months")
But you only want to remove the last month if the month hasn't ended yet, so compare the month of the last row, with the month of the current date.
if (format(end(returns), "%Y%m") == format(Sys.Date(), "%Y%m"))
returns <- last(returns, "-1 month")

How do I subset every day except the last five days of zoo data?

I am trying to extract all dates except for the last five days from a zoo dataset into a single object.
This question is somewhat related to How do I subset the last week for every month of a zoo object in R?
You can reproduce the dataset with this code:
set.seed(123)
price <- rnorm(365)
data <- cbind(seq(as.Date("2013-01-01"), by = "day", length.out = 365), price)
zoodata <- zoo(data[,2], as.Date(data[,1]))
For my output, I'm hoping to get a combined dataset of everything except the last five days of each month. For example, if there are 20 days in the first month's data and 19 days in the second month's, I only want to subset the first 15 and 14 days of data respectively.
I tried using the head() function and the first() function to extract the first three weeks, but since each month will have a different amount of days according to month or leap year months, it's not ideal.
Thank you.
Here are a few approaches:
1) as.Date Let tt be the dates. Then we compute a Date vector the same length as tt which has the corresponding last date of the month. We then pick out those dates which are at least 5 days away from that:
tt <- time(zoodata)
last.date.of.month <- as.Date(as.yearmon(tt), frac = 1)
zoodata[ last.date.of.month - tt >= 5 ]
2) tapply/head For each month tapply head(x, -5) to the data and then concatenate the reduced months back together:
do.call("c", tapply(zoodata, as.yearmon(time(zoodata)), head, -5))
3) ave Define revseq which given a vector or zoo object returns sequence numbers in reverse order so that the last element corresponds to 1. Then use ave to create a vector ix the same length as zoodata which assigns such reverse sequence numbers to the days of each month. Thus the ix value for the last day of the month will be 1, for the second last day 2, etc. Finally subset zoodata to those elements corresponding to sequence numbers greater than 5:
revseq <- function(x) rev(seq_along(x))
ix <- ave(seq_along(zoodata), as.yearmon(time(zoodata)), FUN = revseq)
z <- zoodata[ ix > 5 ]
ADDED Solutions (1) and (2).
Exactly the same way as in the answer to your other question:
Split dataset by month, remove last 5 days, just add a "-":
library(xts)
xts.data <- as.xts(zoodata)
lapply(split(xts.data, "months"), last, "-5 days")
And the same way, if you want it on one single object:
do.call(rbind, lapply(split(xts.data, "months"), last, "-5 days"))

Calculate average value over multiple years for each hour and day

I am trying to calculate an average over multiple years for hourly data. I want to retain the days and hours and average over the years. I feel like this should be simple but I have looked around for an answer and not found one.
I am using R version 3.0.3.
start <- ISOdatetime(1970, 1, 1, hour=0, min=0, sec=0, tz="GMT")
end <- ISOdatetime(1971, 12, 31, hour=18, min=0, sec=0, tz="GMT")
set.seed(1)
z <- zooreg(rnorm(2920), start = start , end = end, frequency = 4, deltat = 21600)
#attempt to aggregate ... doesn't work
z.daily.agg <- aggregate(z, as.POSIXct(cut(time(z), "6 hours", include=T)), mean)
What I would like for the output is the following:
01-01 00:00 average of all January 1st zero hours from 1970-1971
01-01 06:00 average of all January 6th zero hours from 1970-1971
Thanks for your assistance with this!
I believe this will work - using the hour function from the lubridate package.
require(lubridate)
aggregate(z, hour(index(z)), mean)
Edit in response to your comments - sorry, I didn't realise exactly what you wanted. You can average across each hour by day by month across the two years (which I think is what you want) like so:
aggregate(z ~ month(index(z)) + day(index(z)) + hour(index(z)), FUN = 'mean')
Hope that helps
A little crude but you could
#1) Use the substr function to extract the parts of the date string you want:
date = substr(time(z), 6,16)
#2) Then bind this to the data:
temp = data.frame(z, date)
#3) Make sure the date is a factor:
temp$date = as.factor(temp$date)
#4) And now aggregate:
aggregate(temp$z~temp$date, FUN=mean)
Does this give you the results you were after?

Resources